Compiler Construction

Compiler Construction Run-time Environments,

Run-Time Environments (Chapter 7)Continued:Access to No-local Names

Non-locals • Assume we have stack allocation of activation records. • SCOPE RULES of the source language determine how we handle non-local references. • Most languages use LEXICAL (also called STATIC) scoping. • Lexical scoping means it is possible to determine the declaration corresponding to a reference just by examining the program. • Pascal, C, Ada, etc. use static scoping. • Languages with DYNAMIC scoping require examination of the stack, at runtime, to find the right declaration.

Block structure • C and many other languages have BLOCKs: stmt -> block | … block -> { decls stmts } • The scope of a declaration in a block uses the MOSTCLOSELY- NESTED rule: • The scope of a declaration in block B includes B • If “x” is referred to but not declared in B, then “x” is in the scope of a declaration in an enclosing block B’ s.t. • B’ has a declaration of “x” and • B’ is more closely nested around B than any other block with a declaration of “x”

C program with blocks DeclScope int a = 0; B0-B2 int b = 0; int b = 1; int a = 2; int b = 3; what is the output?)

Stack allocation of declarations in blocks • Declarations in each block can be allocated on the stack. • It is similar to a procedure call (with no parameters). • Space is allocated on the stack when we enter the block. • Space is deallocated on the stack when we exit the block.

Lexical scope without nested procedures • C and related languages do NOT allow nested procedures. • A program is a series of declarations and functions. • All non-local references inside functions must refer to declarations at file (global) scope.

Example: lexical scope • Consider the C code: int a[11]; void readarray( void ) { … a … } int partition( int y, int z ) { … a … } void quicksort( int m, int n ) { … } int main( void ) { … a … } • The references to a are always to the array declared on the first line.

Lexical scope • Without nested procedures: • Locals use stack dynamic allocation. • All non-local data is allocated in the static data area. • At compile time, if a reference is not found in the current procedure’s AR, we look in the static data area and use the resulting static address. • Otherwise, the reference is local and accessible relative to the top of stack pointer. • Passing procedures as parameters is also simple if there is no nesting (all non-locals have static addresses).

program sort( input, output ); var a : array[0..10] of integer; x : integer; procedure readarray; var i : integer; begin … a … end { readarray }; procedure exchange( i, j : integer ); begin x := a[i]; a[i] := a[j]; a[j] := x end { exchange }; procedure quicksort( m, n: integer ); var k, v : integer; function partition( y, z: integer ): integer; var i, j : integer; begin… a … … v … … exchange( i, j ); … end { partition }; begin … end { quicksort }; begin … end { sort }; Lexical scope with nested procedures

Nesting depth • The reference to a on line 15: • The ref is inside partition(), which is inside quicksort(). • The most closely nested declaration is line 2, at program (global) scope. • The reference to exchange on line 17: • The ref is in partition(), which is nested in quicksort(). • The most closely nested declaration is line 7. • The compiler need to keep track of the NESTING DEPTH of each declaration: • sort() is at depth 1 • quicksort() is at depth 2 • partition() is at depth 3 • i of partition(): depth 4

Access Links • We need some way to traverse from one AR to another when searching for the declaration corresponding to a reference. • A new pointer, the ACCESS LINK, is added to the AR. • If procedure P is nested inside procedure Q in the program, then the access link in P’s AR should point to the access link in Q’s AR.

Access links • How to find a non-local reference using access links? • Suppose procedure P at nesting depth np refers to a nonlocal “a” with nesting depth na <= np. We find the storage for variable a as follows: • When control is in P, there must be an AR for P on top of the stack. We follow np - na access links. • After following np - na access links, we have the correct AR. The storage for a is some fixed offset relative to the beginning of that AR.

Setting up access links • At compile time, non-local references are represented by the pair (np-na, offset). • We need to set up the access links at procedure call time. • Suppose procedure P at depth np calls procedure X at depth nx. The resulting code depends on whether the called procedure is nested within the caller or not. • Case np < nx : this means X is nested more deeply then P, so X’s access link just needs to point to P’s AR. • Case np >= nx : this means X is at the same level or an outer scope. We have to find the common ancestor of P and X. This will be np-nx+1 access links from P.

Parameter Passing

Parameter Passing • Parameters are the most common way for a calling procedure to communicate with the callee. • Different languages have different parameter semantics. • Mostly, the differences lie in whether an l-value or rvalue or text of the actual parameter is passed. • We consider four protocols: • Call by value • Call by reference • Copy-restore • Call by name

Call by value • This is the simplest parameter passing method. • The caller computes r-values for the actuals. • The caller places the resulting values on the stack, in the AR of the callee. • The callee may change the parameters, but this has no effect on the caller. • This is the default protocol in Pascal, and the ONLY protocol in C.

Parameter passing example • program reference( input, output ); • var a, b: integer; • procedure swap( var x, y: integer ); • var temp : integer; • begin • temp := x; • x := y; • y := temp; • end; • begin • a := 1; b := 2; • swap( a, b ); • writeln( ‘a = ‘, a ); writeln( ‘b = ‘, b ) • end. Specifies call-by-reference

Call by reference • The caller passes the called procedure a POINTER to the storage address of the actual parameter. • If the actual has an l-value, it is used. • If the actual is an expression, we place the result of the expression in a temporary and pass a pointer to the temporary. • Pascal uses call by reference if the “var” keyword is used. • C++ uses call by reference if the “&” operator is specified.

Copy restore • This is a hybrid between call-by-value and call-by reference. • Before callee is activated, we evaluate the actuals and put their r-values in the AR for the callee. • But we also compute and save the l-values of the actuals. • In the return sequence, we copy the updated r-values from the callee’s AR to the location for the saved values. FORTRAN used this approach.

Call by name (macro expansion) • In this method, we just substitute the body of the procedure for the procedure call. • In the copied body, the formal parameters are replaced by the text of the actuals. • #define macros in C/C++ use this technique.

Symbol Tables

Symbol table implementation • The symbol table stores many kinds of information about names: • The NAME itself • STORAGE information • SCOPE information • So a symbol table entry is typically a record data type. • The table itself could be a simple linear array, or a more complex data structure (hash table, etc.).

The NAME entry • Most languages put some bound on the length of ID names. • If the limit is small, we can place the name in the ST entry itself: typedef struct { char name[MAX_LENGTH+1]; … } tSymbolTableEntry • But otherwise, we should use the heap to store the names and simply point to them: typedef struct { char *name; … } tSymbolTableEntry;

Storage information • The code generator needs to know about the storage required for declared names. • Statically allocated variables just have an offset relative to the beginning of the static data area. • Each definition needs to reserve space in the static data area and advance a pointer to the next available location. • For stack dynamic variables, we need to store the offset of the variable relative to the activation record for the procedure. • Heap dynamic variable storage requirements are not known until runtime.

Linear list representations • We add new ST entries to the end of an array. • The array has to be reallocated if it gets too big. • Search for an item begins at the end and goes backwards, to ensure we get the most recent declaration of a name. • Checking for existence takes n/2 checks on average. • For n insertions and e lookups, we have O(n(n+e)) time. • Usually e >> n, so we can write O(ne). • This running time is generally too large for big programs.

Hash table representations of the ST • We try to reduce search time to insert and search the ST with a hash table. • OPEN HASHING gives us a run time of O(n(n+e)/m) for any m we desire. • The table is an array of m BUCKETS. • To determine if s is in the table, we appy a HASH FUNCTION h() to s, such that 0 <= h(s) < m • Then we search the linked list for h(s).

Hash table representations of the ST • Complexity: the average list length is n/m, so as long as m is within a constant factor of n, the search takes nearly constant time. • For h(s), the simplest method is to add up the ASCII values of the characters in s, divide by m, and take the remainder. • There are MANY other techniques. • Most modern languages have library support for hash tables (see hcreate()/hsearch()/hdestroy() if you are a C lover).

Scope and the ST • Each entry in a ST corresponds to a declaration of a name. • When we look up a name in the ST, we want the entry for the declaration at the correct scope to be returned. • The simplest approach is to have a separate hash table for every scope. • Another way is to give each procedure a unique number, and append the number to each name, guaranteeing uniqueness.

Dynamic Storage Allocation

Explicit vs. implicit alloc/dealloc • Most languages support dynamic allocation of memory. • Pascal supports new(p) and dispose(p) for pointer types. • C provides malloc() and free() in the standard library. • C++ provides the new and free operators. • These are all examples of EXPLICIT allocation. • Other languages like Python and Lisp have IMPLICIT allocation.

Garbage • In languages with explicit deallocation, the programmer must be careful to free every dynamically allocated variable, or GARBAGE will accumulate. • Garbage is dynamically allocated memory that is no longer accessible because no pointers are pointing to it. • In some languages with implicit deallocation, GARBAGE COLLECTION is occasionally necessary. • Other languages with implicit deallocation carefully track references to allocated memory and automatically free memory when nobody refers to it any longer.

Dynamic storage allocation • We assume the heap is an initially empty block of memory. • As memory is allocated and deallocated, fragmentation occurs. • For allocation, we must find a HOLE large enough to hold the requested memory. • For deallocation, we must merge adjacent holes to prevent further fragmentation.

Compiler Construction