290 likes | 328 Views
Languages and Compiler Design II Runtime System. Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/27/2010. Agenda. Runtime Storage Organization Static Storage Runtime Stack System Heap Functions and Activations Activation Records
E N D
Languages and Compiler Design IIRuntime System Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/27/2010 CS322
Agenda • Runtime Storage Organization • Static Storage • Runtime Stack • System Heap • Functions and Activations • Activation Records • Function Call • Register Saving • Scopes • Function Parameters CS322
Runtime Storage Organization Stack Heap Static Data Code Reserved Space Multiple memory uses on computer: • OS memory needs; e.g. ½ for Windows • Program code • User program data • Function invocations • Temporaries • I/O buffers • Etc. Different requirements, caused by differences in: lifetime, size, access rights. Result: static space, stack, and heap CS322
Static Storage Space for static data objects is allocated in a fixed location for the whole lifetime of a program • Possible when the sizes of the objects is known at compile-time • Static objects can be bound to absolute addresses; not necessarily desirable • Static allocation requires no runtime management, hence simple to handle • Space is wasted if objects are not needed for complete program lifetime • Mostly used for global variables, code, and constants • Fortran and Cobol are designed to use only static storage • Such ancient languages need no support for recursive functions, nor do they allow dynamic arrays CS322
Runtime Stack Stack needed for data that are pushed and popped dynamically, following a last-in, first-out pattern • Space needed at moment of function call, freed at moment of return • Allocation and de-allocation can be implemented cheaply, by adjusting stack pointer; though “old” data remain in memory • More efficient use of space than static allocation • Most newer imperative languages use stack storage for data associated with activations; became popular with Algol60 CS322
System Heap Space for heap data objects can be allocated and freed any time during program execution. Is most flexible, expensive, and dangerous method of storage allocation (memory leaks). Typical heap operations are: • Allocation: Acquire free storage for program. Typically triggered by explicit or implicit user commands, e.g. (C) struct node *root = (struct node *) malloc( sizeof( struct node ) ); (Java) TreeNode root = new TreeNode( val ); • De-allocation: Reclaims (AKA frees) no-longer-needed storage for reuse • Languages such as C and Pascal contain commands for storage reclaiming • e.g. free( root ) • Compaction: Construct larger blocks of free storage from smaller pieces • Can be triggered by a failed allocation request; AKA garbage collection • Lisp, ML and some interpreted languages need heap for activation records CS322
Functions and Activations • Functions, procedures (methods), and classes constitute a form of programming abstraction • focus here functions, not classes • Allow program to be divided into named components with hidden internals • returning a result to place on invocation • Permits code re-use • Each function invocation at runtime is called an activation • Each activation has its own data: formals => actuals, and locals • Storage for these data is called an Activation Record (AR). AKA Stack Frame • Many activations for the same function can exist at one moment of time, due to recursive calls • Data associated with one activation are independent from all others • Normally, an activation record is created when a function is invoked and is destroyed when the function returns CS322
Activation Records Activation record typically contain the following entries: • Return address: the address of instruction after the call • Formal parameters: sequence of parameters passed to function by caller • At call: actuals. Inside function: formals. Actuals are bound to formals • Return value: a place for storing the function return value • Local data: storage for local variables • Access link: AKA static link, a pointer to next activation record in chain for accessing non-local data • e.g. lexically enclosing function’s AR, as in Algol, Ada, Pascal, PLI • Control link: AKA dynamic link, pointer to caller’s activation record • Saved machine status: holds info about the machine (i.e. registers’ values) just before the function is called • Temporaries: storage for compiler-allocated temporary objects (e.g. dynamic arrays) CS322
Where Are Activation Records Stored? Static Allocation: number of ARs and the size of each AR must be known at compile time • No runtime management needed • Multiple invocations of the same function reuse the same AR • Can’t handle recursive functions and dynamic data • Only early Fortran uses this approach Stack Allocation: ARs are pushed on and popped off the stack • Works for block-structured languages: a function must return before its own caller returns • Very efficient: hence default choice of most programming languages • Can’t handle “first-class” functions Heap Allocation: AR can be created and destroyed at any time • Needed for implementing functional languages • High overhead CS322
Stack-Based AR Allocation save reg 1 locals In most languages, if function f is declared inside function g, then f can only be invoked within the scope of g. This nesting property of function calls makes it possible to allocate ARs on a stack. Guarantees that non-local variables exist when needed. Stack implementation is very efficient. access link control link actual N Stack Growth actual 1 Return Value temp N sp temp 1 local N local 1 save reg N save reg 1 access link control link bp actual N actual 1 Return Value CS322
Function Call When a Method is activated = Function is Called: The Caller: • allocates [part of] an activation record for the callee • evaluates the actual parameters, and stores them into AR • stores a return address [or return slot] into the AR • if needed, saves (some) register values into the AR • stores current AR pointer (AKA bp, for base pointer; or bp for base pointer) and updates it to point to callee’s AR • But which place in the AR? • transfers control to the callee The Callee: • saves (some) register values and other machine status info • allocates and initializes its local data and begins execution • Allocates temps, if needed CS322
Function Call, Cont’d Upon Returning The Callee: • places return value at place the caller can access • restores caller’s AR pointer and other registers, using saved info in stack marker • returns control to the caller The Caller • can copy the returned value into its own AR • On some architectures: frees space for actuals CS322
Register Saving Live registers’ content must be saved in memory before they can be used for new purpose in callee. The register-saving task can be done by the caller alone, by the callee alone, or split between the two • Caller Saving: The caller needs to save the registers that hold live data, regardless whether the callee is actually going to use any of these registers. May end being unnecessary work • Callee Saving: The callee needs to save the registers that it’s going to use, regardless whether they contain any live contents. It may also end up doing unnecessary work • Split Saving: Designate a set of registers as caller-save registers, and the rest callee-save registers. The caller may use any callee-save register without saving; while the callee may use any caller-save register without saving CS322
Scopes Def: Scope is a region of program text over which a name is known; e.g. var binding is effective. Scopes are typically introduced by function declarations as well as program blocks, like { … } blocks in C++, Java main() { //B0 int a = 0, b = 0; { //B1 int b = 1; { //B2 int a = 2; printf("%d %d\n", a, b); } //end B2 { //B3 int b = 3; printf("%d %d\n", a, b); } //end B3 printf("%d %d\n", a, b); } //end B1 printf("%d %d\n", a, b); } //end B0 CS322
Lexical Scope Rule Under lexical scope rules, variables are identified by looking backwards through the program text to find the nearest enclosing declaration. early all programming languages use lexical scope For the program on the right, when f is executed, it needs to look up a value for a, which is a free variable of f. The nearest enclosing declaration in this case is the global declaration. At the time f executes, the global a has the value 5, so f returns 5+10, and 15 is printed by the program. program main; var a : int := 0; function f( b : int) : int is return a + b; end f; function g( c : int ) : int is var a : int := 1; a := a + 2; return f( c ); end g; begin a := 5; print( g( 10 ) ); end main; CS322
Nested Scopes Environments Associated with a Function: • Definition Environment: the environment in which the function is defined. Needed if lexical scope is used • Invocation Environment: the environment in which the function is invoked. Needed if dynamic scope is used • Passing Environment: the environment in which the function is passed as a parameter. No direct use CS322
Needed Environment Set up an access link in AR to point to the AR of function’s def-env or invoc-env: either another AR on stack or the global env: • For static-scoped languages: The access link should be pointing the function’s def-env, which can be derived from the caller’s access link (see next slide). In the case of a nest of scopes, a chain of access links can be followed to access to every enclosing environment of an inner function • For dynamic-scoped languages: The access link should be pointing to the function’s invoc-env, which is simply the caller’s AR(!). Since the control link is already pointing to caller’s AR, there is no need to set up a separate access link CS322
Setting Up Access Links Assume f calls g, and f and g are defined at scope-levels m and n, respectively. Further assume that f ’s access link is already set up: • If m > n — For g to be visible to f , g’s definition environment must be one of the scopes that encloses f . Traverse f ’s access links, the AR at scope-level n − 1 should be the target for g’s access link • If m = n — f and g are defined in the scope Simply use f ’s access link as g’s access link • If m < n — f must be the definition environment of g. Let g’s access link points to f ’s AR CS322
Sample: Scope program main; | function count( i : integer; a: Intlist): integer; | | var sum: integer := 0; | | procedure check_int( j : integer ); | | | begin –- check_int | | | if j = i then sum := sum + 1; end if; | | | end check_int; | | procedure do_intlist( a: Intlist ); | | | begin -– do_intlist | | | while (a) loop | | | check_int(a^.x); a := a^.next; | | | end loop; | | | end do_intlist; | | begin –- count | | | do_intlist( a ); count := sum; | | end count; | procedure print_int( i: integer); | | begin –- print_int | | | writeln( i ); | | end print_int; | begin -- main | var a: Intlist; | print_int( count( 1, a ) ); end main; CS322
Execution Scenario main calls count count calls do_intlist do_intlist calls check_int · · · main calls print_int (A snapshot of ARs on the stack is shown on the right.) When check_int is passed as a parameter to do_intlist, its access link can be computed, since it is defined in this scope. CS322
Function Parameters – e.g. Pascal program main; procedure do_intlist(a: Intlist; procedure f(i: integer)); begin ... f(a^x); ... end; function count(i: integer; a: Intlist): integer; var sum: integer := 0; procedure check_int(j: integer); begin if j = i then sum := sum + 1; end; begin do_intlist(a, check_int); count := sum; end; begin var a: Intlist; print_int(count(2,a)); end. Here check_int is passed as a parameter to do_intlist, and gets invoked there; it references two non-local variables i and sum, which are not global variables either. Cannot be directly expressed in C or C++ CS322
Functions as Parameters • The call-callee relationship discussed previously does not hold for languages with nested procedural scopes, like Pascal, Algol, Ada • check_int’s definition environment may have nothing to do with do_intlist. How can we set up the access link for check_int’s AR in this case? • Solution: The routine that passes f as a parameter to g has information about f ’s definition and can set up access link for f . And it can pass f ’s access link together with f . • Effectively, when passing a function as a parameter, we should pass a closure (function pointer plus its environment) instead of just a function pointer. CS322
Passing Global Functions • Global functions have a unique feature — the definition environment is the global scope. There is no need to set up an access link for a global function’s invocation, since any non-local variable to these functions must be a global variable, which can be accessed directly. • Example: In C, all functions are defined at global scope, hence there is no need to use closure to handle function parameters. • Side Note: gcc extends C with nested function definitions, but it does use closures to handle function parameters — result’s correctness is not guaranteed(!). CS322
Functions as Return Values • Going one step further, suppose that function values are treated like other values, e.g., they can be returned as function results or stored into variables (the following example is in ML): type counter = int list -> int fun make_counter( i : int ) : counter = let fun count( a: int list) = let val sum = ref 0 fun check_int( j : int ) = if j = i then sum := !sum + 1 else () in do_intlist( a, check_int ); !sum end in count end val g: counter = make_counter(2); val c: int list = ...; val c2 : int = g(c); • A scenario: main calls make_counter, which returns count; main calls count; count calls do_intlist; do_intlist calls check_int; CS322
Functions as Return Values • The scenario: main calls make_counter, which returns count; main calls count; count calls do_intlist; do_intlist calls check_int; • Problems: check_int requires value of non-local variable i , which is the parameter to make_counter, but activation of make counter is no longer live when check_int is called! • If i is stored in activation record for make_counter and activation-record is stack-allocated, it will be gone at the point where check int needs it! • Solution: Store activation records in the heap • Special Case: Again, if a global function is returned as a return value, there is no problem for executing it later, since all its non-local variables are global variables CS322
Handling Program Blocks Nested program blocks can have their own local variables. E.g. if (i>j) { int x; ... } else { double y[100]; ... } Where should these variables be stored? • Solution 1: Consider a block as an in-line function without parameters, and create an AR for it. Advantages: efficient use of storage. Downside: high runtime overhead • Solution 2: Use ARs only for true functions. If there are blocks within a function, statically collect storage requirement information from each block; then compute the maximum amount of storage needed for handling all blocks, and allocate that in the function’s AR. Advantages: no runtime overhead. Downside: may waste storage space CS322