Structure of Compilers

Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Tokens Syntactic Structure Semantic Analysis Parser Intermediate Representation Optimizer Symbol Table Code Generator Target machine code

Binding • Binding - association of an operation and a symbol. • Binding time - when does the association take place? • Design time • Compile time • Link time • Runtime

Binding time • Example: int count; count = count + 5; • Type of count is bound at compile time. • Value of count bound at execution time. • Meaning of + bound at compile time. • Possible types for count, internal representation of 5 and set of possible meanings for + bound at compiler design time.

Binding • Static • It occurs before runtime and remains unchanged throughout program execution. • Dynamic • It occurs at runtime and can change in the course of program execution.

Type Binding • Before a variable can be referenced in a program, it must be bound to a data type. • Two important questions to ask: 1. How is the type specified? • Explicit Declaration • Implicit Declaration • All variable names that start with the letters ‘i’ - ‘r’ are integer, real otherwise • @name is an array, %something is a hash structure • Determined by context and value

Type Binding 2. When does the binding take place • Explicit declaration (static) • Implicit declaration (static) • Determined by context and value (dynamic) • Dynamic Type Binding • When a variable gets a value, the type of the variable is determined right there and then.

Dynamic Type Binding • Specified through an assignment statement (set x ‘(1 2 3)) <== x becomes a list (set x ‘a) <== x becomes an atom • Advantage: • flexibility (generic program units) • Disadvantages: 1. High cost (dynamic type checking and interpretation) 2. Type error detection by the compiler is difficult

Type Inference • Rather than by assignment statement, types are determined from the context of the reference. • Some languages don’t support assignment statements • Example: (define someFunction (m n) .... ) (someFunction ‘a ‘b) <== m becomes an atom (someFunction ‘(a b c) ‘(1 2 3)) <= m becomes a list

Storage Bindings • Allocation • getting a cell from some pool of available cells • Deallocation • putting a cell back into the pool • The lifetime of a variable is the time during which it is bound to a particular memory cell.

Variable lifetime • Static • bound to memory cells before execution begins and remains bound to the same memory cell throughout execution. • C’s static variables • Advantage: • efficiency (direct addressing), • history-sensitive subprogram support • Disadvantage: • lack of flexibility (no recursion)

Variable lifetime • Stack-dynamic • Storage bindings are created for variables when their declaration statements are elaborated. • If scalar, all attributes except address are statically bound. • e.g. local variables in Pascal and C subprograms • Advantage: • allows recursion; conserves storage • Disadvantages: • Overhead of allocation and deallocation • Subprograms cannot be history sensitive

Variable Lifetime • Explicit heap-dynamic • Allocated and deallocated by explicit directives, specified by the programmer, which take effect during execution. • Referenced only through pointers or references • e.g. dynamic objects in C++ (via new and delete) • all objects in Java • Advantage: • provides for dynamic storage management • Disadvantage: • inefficient and unreliable

Variable lifetime • Implicit heap-dynamic • Allocation and deallocation caused by assignment statements • e.g. Lisp variables • Advantage: • flexibility • Disadvantages: • Inefficient, because all attributes are dynamic • Loss of error detection

Type Checking - • Generalize the concept of operands and operators to include subprograms and assignments • Type checking is the activity of ensuring that the operands of an operator are of compatible types • A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler-generated code, to a legal type. This automatic conversion is called a coercion.

Type Checking • A type error is the application of an operator to an operand of an inappropriate type. • If all type bindings are static, nearly all type checking can be static • If type bindings are dynamic, type checking must be dynamic • A programming language is strongly typed if type errors are always detected.

Type Checking • Advantage of strong typing: • Allows the detection of the misuses of variables that result in type errors. • Languages: • 1. FORTRAN 77 is not: parameters, EQUIVALENCE • 2. Pascal is not: variant records • 3. Modula-2 is not: variant records, WORD type • 4. C and C++ are not: parameter type checking can be avoided; unions are not type checked. • 5. Ada is, almost (UNCHECKED CONVERSION is loophole) (Java is similar) • Coercion rules strongly affect strong typing • they can weaken it considerably (C++ versus Ada)

Name Type Compatibility • Type compatibility by name means the two variables have compatible types if they are in either the same declaration or in declarations that use the same type name • Easy to implement but highly restrictive: • Subranges of integer types are not compatible with integer types • Formal parameters must be the same type as their corresponding actual parameters

Name Type Compatibility type indextype = 1..100; //subrange var count : integer; index : indextype; Is count and index name type compatible?

Structure Type Compatibility • Type compatibility by structure means that two variables have compatible types if their types have identical structures • More flexible, but harder to implement type myType1 = double; type myType2 = double; myType1 data1; myType2 data2; data1 = data2; //????

Type Compatibility type type1 = arary [1..10] of integer; type2 = array [1..10] of integer; type3 = type2; type2 is compatible with type3 by name type1 is compatible with type2 by structure

Type Compatibility • Anonymous types A: array [1..10] of INTEGER; B: array [1..10] of INTEGER; C, D: array [1..10] of INTEGER;

Scope • The scope of a variable is the range of statements over which it is visible. • The nonlocal variables of a program unit are those that are visible but not declared there. • The scope rules of a language determine how references to names are associated with variables.

Static scope • Based on program text • To connect a name reference to a variable, you (or the compiler) must find the declaration. • Search process: • search declarations, first locally, then in increasingly larger enclosing scopes, until one is found for the given name. • Enclosing static scopes (to a specific scope) are called its static ancestors; the nearest static ancestor is called a static parent.

Static Scope • Blocks - a method of creating static scopes inside program units--from ALGOL 60 C and C++:for (...) { int index; ... } Ada: declare LCL : FLOAT; begin ... end

Static Scope: Procedures Evaluation of Static Scoping (for nested procedures) Consider the example: MAIN Define Sub A Define Sub C Define Sub D Call Sub C Define Sub B Define Sub E Call Sub E Call Sub A MAIN A C D B E

Static Scope: Variables program main; var x: integer; procedure sub1; begin print(x); end; {sub1} procedure sub2; var x: integer; begin x := 10; sub1; end; begin x := 5; sub2 end. Output 10 or 5?

Static Scope • Suppose the spec is changed so that D must now access some data in B • Solutions: 1. Put D in B (but then C can no longer call it and D cannot access A's variables) 2. Move the data from B that D needs to MAIN (but then all procedures can access them) • Same problem for procedure access! • Overall: static scoping often encourages many globals

Dynamic Scope • Based on calling sequences of program units, not their textual layout (temporal versus spatial) • References to variables are connected to declarations by searching back through the chain of subprogram calls that forced execution to this point. • Evaluation of Dynamic Scoping: • Advantage: convenience • Disadvantage:hard to debug, hard to understand the code

Dynamic Scope: Variables program main; var x: integer; procedure sub1; begin print(x); end; {sub1} procedure sub2; var x: integer; begin x := 10; sub1; end; begin x := 5; sub2 end. Output 10 or 5?

Scope vs. Lifetime • Scope and lifetime are sometimes closely related, but are different concepts!! • Consider a static variable in a C or C++ function void someFunction() { static int x; ... }

Intermediate Representation • Almost no compiler produces code without first converting a program into some intermediate representation that is used just inside the compiler. • This intermediate representation is called by various names: • Internal Representation • Intermediate representation • Intermediate language

Intermediate Representation • Intermediate Representations are also called by the form the intermediate language takes: • tuples • abstract syntax trees • Triples • Simplied language

Intermediate Form • In general, an intermediate form is kept around only until the compiler generates code; then it cane be discarded. • Another difference in compilers is how much of the program is kept in intermediate form; this is related to the question of how much of the program the compiler looks at before it starts to generate code. • There is a wide spectrum of choices.

Abstract Syntax Tree x = y + 3; = x + y 3

Quadruples y a x b T1 T2 c T3 T4 y= a*(x+b)/(x-c); T1= x+b; (+, 3, 4, 5) T2=a*T1; (*, 2, 5, 6) T3=x-c; (-, 3, 7, 8) T4=T2/T3; (/, 6, 8, 9) y=T4; (=, 0, 9, 1)

Structure of Compilers

Structure of Compilers

Presentation Transcript

Advanced Compilers

Honors Compilers

Compilers

Compilers

Compilers:

Optimizing Compilers

Compilers

COMPILERS

Compilers

COMPILERS

Compilers

Compilers

Advanced Compilers

Compilers

Compilers

Compilers

Compilers