Topic 3 -Binding Time and Symbol Tables

Topic 3 -Binding Time andSymbol Tables Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems Concepts Fall 2002 Monday Wednesday 2:30-3:50 LI 99

Introduction to Binding • Binding refers to associating an entity with a value, such as • Variable name with address0 • Result of expression with ephemeral storage • Constant with its value • Seperately compiled function with address

Binding Time • Binding time refers to when entities are associated with their values is made.

Design Binding Times • There are extra binding times available to programming language designers. • Language Design Time - Choose fundamental primitives, reserved words, etc. • Compiler/Interpreter Implementation Time -How to internally represent language constructs. • Programming Time - Language users pick the algorithms and data structures.

Object - What does it mean? • The word Object has many meanings in program languages. • Object Module -A compiled (but not linked) module of a program. • Object (OOP sense) - An instance of a class in Object Oriented Programming. • Object (Programming Language Sense) -The entities which are bound to values. • Use the programming language for now.

Binding Time Design Issues • Late binding of objects indicates that interpreters. • Dynamic Type Systems • Care needs to be taken to avoid ambiguity when binding. • Name Space Collisions • Polymorphism (Overloading)

Object Attributes • Objects have many attributes • Lifetime (Persistence) • Type • Scope • Value/Address • Language should: • Precisely specify attributes • Be Orthogonal -Separate Controls

Object Persistence vs. Lifetime • Persistence -Persistant objects last longer than the process that created it. • Examples - Files, databases. • Memory for nonpersistent objects is called volatile (you lose data if powered down). • Lifetime - When is the storage allocated to an object available?

Events Impacting Object Lifetime • Life Time has several aspects. • Creation of objects • Creation of bindings • References to variables/subroutines/types/etc. • (Re)activation and Deactivation of bindings • Destruction of bindings • Destruction of Objects

Allocation andObject Lifetime • How can objects be allocated? • Statically -Exist during Program's Lifetime • Stack -Used for ephemeral objects and ephemeral objects. • Heap Objects -Have controlled lifetimes • Deallocation: How is it indicated? • Explicitly - Destructors/free/delete • Implicitly - Garbage collection • Initialization - Separate (Constructors)

Static Allocation • Done at compile time • Literals (and constants) bound to values • Variables bound to addresses • Compiler notes undefined symbols • Library functions • Global Variables and System Constants • Linker (and loader if DLLs used) resolve undefined references.

Stack Based Allocation • Stack Layout determined at compile time • Variables bound to offsets from top of stack. • Layout called stack frame or activation record • Compilers use registers • Function parameters and results need consistent treatment across modules • C/C++ use prototypes • Eiffel/Java/Oberon use single definition

Parameter Passing Conventions • Actual Parameters -at the call site • Formal Parameters - at the subroutine declaration • Address - a memory location, data objects containing addresses can be called: • Pointer - use explicit dereferencing operation. • Reference - use implicit dereferencing.

Parameter Passing Conventions • Call by value - Copy to the function • Call by reference - Pass reference • Call by address - Pass address to function • Call by result - Pass result back to caller • Call by value result - Copy inputs to the function and copy results to caller. • Parameters can be on stack or in registers.

Call Site Code Generation for Stack Allocation • Call Setup • Push Register Values on stack (if caller saves) • Push parameters on stack (or load into registers) • Call Function • Push Return Address on stack • Goto Function's Start Address • Call Cleanup (if caller saves)

Subroutine Code Generation for Stack Allocation • Prologue -Push Registers that will be overwritten on stack (if callee saves) • Body of function • Call Cleanup (if caller saves) • Copy results (if any) • Pop Parameters off stack. • Pop registers • Return

Stack and Frame Layout • Stack here grows toward low addresses.

Heap Allocation • Heap provides dynamic memory management. • Not to be confused with binary heap or binomial heap data structures. • Under the hood, may periodically need to request additional memory from the O/S. • Requested large regions (requests are expensive). • Done using a library (e.g. C) • Or as part of the language (C++, Java, Lisp).

Heap Data Structures • Must track allocated/Free Memory. • Metadata is added (pointers, size, etc).

Memory Management • Holes can form where memory is freed. • Coalesce adjacent holes • Small holes fragment the memory. • Suppose you allocate a smaller chunk, which hole do we take it from? • First fit - The first hole found that it fits into • Best fit - The smallest segment it fits into • Worst fit - The largest segment it fits into

When to Free Memory • Depends on language. • Explicit deallocation -needed for library approaches (e.g. C). • Implicit Deallocation - aka garbage collection • Garbage is unreferenced memory. • Compaction moves allocated memory to contiguous addresses (coalescing all holes). • Can cause timing variations (care is needed in real time systems).

Speeding Up Searching for a Free Block • Recall all fitting scheme require finding sufficiently large blocks. • Idea: Organize Free List according to block size. • Fibbonacci Heap - Use Fibbonacci numbers for block sizes. • Buddy System -Use Block sizes of 2k

Introduction to Scope • Scope refers to the region of a program during which a binding is active. • Consider the following code segment, what should the output be?

Scope Rules • Two popular answers to the problem. • Static (lexical) scope -Use compile time analysis. Normally in block structured languages, the containing scope is preferred, output is 1 in this case. • Dynamic Scope -Value found at run time by resolving to nearest stack frame in which the value is defined, output is 2 in this case. • Lexical scope is more popular.

Variants of Static Scope • Single Global Scope (BASIC) - simplest • Global and Local (Fortran) • Fortran Common Blocks • Supports separate compilation • Gives base address of region • Each program specifies (possibly different) layout • Block Structured (Pascal)

Modules and Separate Compilation • Modules support encapsulation (much like classes). • Found in Modula 2, Euclid, Oberon and Ada. • For separate compilation • define interfaces (data and subroutines) • Export statements - published interfaces • Import statements - uses published interfaces • Classes are extensions of modules

More Notation • Fundamental question: Does the scope need to be explicitly imported to be visible? • Yes - Referred to as closed scope. • No - Referred to as open scope. • Aliasing -having more than one way to refer to the same object.

Classes and Scope • Classes provide encapsulation in object oriented programming (OOP). • Supports aggregating heterogeneous data and operations together. • Interfaces are published • C++ public section in classes • Internals can be hidden (ala private section in C++) • Constructors and destructors supported.

OOP Features • I think of OOP as providing • Encapsulation -groups data with operations • Inheritance -permits extension of more general base classes (and overriding behaviors) • Polymorphism (overloading) - allows operators/subroutines to have behaviors dependent on the types of arguments and results expected.

Dynamic Scope • Dynamic scoping prefers the instance defined in the most recently invoked function. • Not very popular currently (hard to debug) • Found in interpreted languages (APL, older Lisp dialects, e.g. EMACS Lisp). • Fans claim that it makes customizing subroutines easier.

Another Dynamic vs. Static Scope Example

Symbol Table Design Criteria • Symbol tables require: • Fast insertion • Fast lookup • Occasional deletion (should be fast). • Which motivates the use of hash tables. • But ordinary hash tables are not good with nesting (ala classes/records/subroutines)

Operations on Symbol Tables (Static Scope) • A Symbol Table should support: • Entering Scope • Leaving Scope • Inserting a symbol (with scope information) • Looking up a symbol (with scope information) • It is often useful to store symbol table in object/executables • e.g. For debugging or source level analysis

LeBlanc-Cook Symbol Table Lookup 1/5 • LeBlanc-Cook Symbol Table Lookup • Each Scope is assigned a serial number • Elements are never deleted from the table • A Scope Counter is maintained • The first scope is 0 • Every new scope encountered increments the counter • To track nesting, a scope stack is maintained. • Push to enter scope, pop when leaving scope

LeBlanc-Cook Symbol Table Lookup 2/5 • Put all symbols in a single hash table. • Keywords not inserted (can use another hash). • Entries indexed using both name and scope. • To lookup a name • Look in the hash table for (name,scope) pair. • If not found: • Parent scope is found using stack • Test if parent scope is open or exports symbol

LeBlanc-Cook Symbol Table Lookup 3/5 • About Hashing and Hash Functions: • Is the universe of keys known in advance? • Yes - perfect minimal hashing may be possible. • No - must handle collisions • e.g. Quadratic Rehash or Chaining • Symbol Table Algorithm has to handle collisions if hashing is used.

LeBlanc-Cook Symbol Table Lookup 4/5

LeBlanc-Cook Symbol Table Lookup 5/5

LeBlanc-Cook Symbol Table (An Example)

Dynamic Scope and Symbol Table Management • Dynamic scope has different symbol table management needs than static scope • Needs insert, lookup, enter scope, leave scope. • Just like static scope • Competing Approaches: simplicity vs. speed • Association Lists -Simple, fast scope entry/exit. • Central Reference Table -Like Leblanc-Cook sans reference stack. • Faster Lookup (common case?), slower scope entry/exit.

Association Lists • Association Lists (A-Lists) combine list and stack treatment. • When a new scope is entered • Push its symbols on the stack • Use a unidirectional linked list to implement stack. • To find an item • Scan stack starting at top of stack. • When leaving a scope • Pop all symbols in scope from the stack.

Central Reference Tables (1) • Central Reference Tables use hashing • Elements are keyed by symbol • Each element is a stack • So we have one stack per symbol • Newest Scope is on top • Use a unidirectional linked list to implement stack.

Central Reference Tables (2) • To insert a symbol/scope • Hash on symbol, push symbol/scope on stack. • To find a symbol in a scope • Hash to symbol's stack • Use scope at top of stack. • When leaving scope • Pop all symbols in that scope from top of their respective stacks.

Resolving Static Scope at Run Time • Consider a function F containing G. • i.e. F and G are nested functions • Suppose G uses an identifier in F's scope. • How can G find F's frame pointer at run time? • If G is always invoked by F, just do base + offset • Called static chaining - offset computed at compile time. • But what if G is separated by recursive invocations • Use pointer jumping (exploit transitivity and associativity) • Called dynamic chaining - requires run time support

An Example Requiring Dynamic Chaining

Subroutine Closures • Consider when a function, F, is passed as an argument to another function, G • E.g. Comparison Operators for sorting • When G invokes F, how can we determine the scope? • Subroutine closures describe a function's scope and instruction space address

Overloading Defined • An overloaded function or operator selects its semantics based on the types of its parameters and result • Implicit overloading - provided by language • e.g. addition in Pascal can handle real or integers • Write and Writeln in Pascal • Explicit overloading - programmers resolve actions • e.g. Overloaded operators and methods in C++

Some thoughts on Overloading • Should user defined overloading of operators be permitted? • Pro: Permits consistent interface • e.g. A = B * C; good for integer, real, complex ... • Cons: You may need to read the entire program to understand a single line of code. • e.g. A = B * C; What if B and C are objects? Inheritance? • What to do with ephemeral objects? e.g A * B * C

More Thoughts • Meyer's Eiffel overloads A(i) • Single parameter function • Single index array • Because functions and arrays are often interchangeable! • Operator vs. function overloading • Operator - Syntactic Sugar • Function - Programmers know to read code

Challenges of Overloading • Compiler needs to be smart about types • Separate compilation hard • e.g. Unix Linker - Predates C++ • Name Space Mangling • Can break system tools (profilers/debuggers) • Compiler creates a unique name based on operator/function name and parameter/result types. • No standard defined • Hard to link code compiled by different compilers

Topic 3 -Binding Time and Symbol Tables