340 likes | 478 Views
CSC 415: Translators and Compilers Spring 2009. Chapter 5 Contextual Analysis. Chapter 5: Contextual Analysis. Identification Monolithic Block Structure Flat Block Structure Nested Block Structure Attributes Standard Environment Type Checking A Contextual Analysis Algorithm
E N D
CSC 415: Translators and CompilersSpring 2009 Chapter 5 Contextual Analysis
Chapter 5: Contextual Analysis • Identification • Monolithic Block Structure • Flat Block Structure • Nested Block Structure • Attributes • Standard Environment • Type Checking • A Contextual Analysis Algorithm • Case Study: Contextual Analysis in the Triangle Compiler
Contextual Analysis • Given a parsed program, the purpose of contextual analysis is to check that the program conforms to the source language’s contextual constraints. • Scope rules: rules governing declarations and applied occurrences of identifiers • Type rules: rules that allow us to infer the types of expressions, and to decide whether each expression has a valid type • Analysis of the program to determine correctness with respect to the language definition (beyond structure)
Contextual Analysis • Contextual analysis consists of two sub-phases: • Identification: applying the source language’s scope rules to relate each applied occurrence of an identifier to its declaration (if any). • Type checking: applying the source language's type rules to infer the type of each expression, and compare that type with the expected type.
Semantic Analyzer Identification Type checking Structure of a Compiler Lexical Analyzer Source code Symbol Table tokens Semantic Analyzer Parser parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
Identification • Relate each applied occurrence of an identifier in the source program to the corresponding declaration • Ill-formed program if no corresponding declaration – generate error • Identification could cause compiler efficiency problems • Inefficient to use the AST
Identification Table • Also known as symbol table • Associates identifiers with their attributes • Basic operation • Make the identification table empty • Add an entry associating a given identifier with a given attribute • Retrieve the attribute (if any) associated with a given identifier • Attribute • Consists of information relevant to contextual analysis • Obtained from the identifier’s declaration
Identification Table • Each declaration in a program has a defined scope • Portion of program over which the declaration takes effect • Block: any program phase that delimits the scope of declarations within it • Example Triangle block command • Let D in C • Scope of each declaration in D extends over the subcommand C
Identification Table: Structure/Implementation • Maintain scope • An identifier should be found in the table only when valid • If an identifier is defined in multiple scopes, then a lookup in the table must provide the appropriate meaning for the use • Efficiency • How fast is lookup? • How fast to enter/exit a scope? • What is the overall table size?
Identification Table: Structure/Implementation • Different implementations • Organized for efficient retrieval • Binary search tree • Hash table
Identification Table: Functionality • A mapping of identifiers to their meanings • Information • Name • Type • Location • Operations • Create • Insert • Lookup • Delete • Update entry • Entering a new scope • Leaving a scope
Block Structures • Monolithic block structure • Basic and Cobol • Flat block structure • Fortran • Nested block structure • Pascal, Ada, C, and Java
Monolithic Block Structure • The only block is the entire program • All declarations are global • Simple rules • No identifier may be declared more than once • For every applied occurrence of an identifier I, there must be a corresponding declaration of I • No identifier may be used unless declared • The identification table should contain entries for all declarations in the source program • At most, one entry for each identifier • The table contains an identifier I and the associated attribute A
Monolithic Block Structure • Program • integer b = 10 • integer n • char C • begin • … • n = n * b • … • Write c • … • end Identification Attribute b (1) (2) n (3) c • Create new table • create command • At declaration for identifier I, make table entry • insert command • At applied occurrence of identifier I, retrieve information from table • lookup command
Flat Block Structure • Program partitioned into several disjoint blocks • Two scope levels • Some declarations are local in scope • Identifiers restricted to particular block • Other declarations are global in scope • Identifiers allowed anywhere in the program – the program as a whole is a block • Less simple rules • No global declared identifier may be re-declared globally • But same identifier may also be declared locally • No locally declared identifier may be re-declared in the same block • Same identifier may be declared locally in several different blocks • For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I • Either global declaration of I or a declaration of I local to B Minor complication is to distinguish global and local declaration entries
(1) procedure Q Level Identification Attribute (2) real r (3) real pi = 3.14 begin … end Q (1) global (2) r local (3) pi local Level Identification Attribute (4) procedure R Level Identification Attribute (5) integer c begin … end Q (1) global Q (1) global (4) R global (4) R global (5) c local program (6) integer i (7) boolean b (8)char c begin … call R … end Level Identification Attribute Q (1) global (4) R global (6) i local Flat Block Structure • Create new table • create command • At start of a block • enter new scope command • At end of a block • leave scope command • delete command • At declaration for identifier I, make table entry • insert command • At applied occurrence of identifier I, retrieve information from table • lookup command b (7) local c local (8)
Nested Block Structure • Blocks may be nested one within another • Many scope levels • Declarations in the outermost block are global in scope. • The outermost block is at scope level 1 • Declarations inside an inner block are local to that block • Every inner block is completely enclosed by another block • Next to outermost block is at scope level 2 • If enclosed by a level-n, the block is at scope level n+1
Nested Block Structure • More complex rules • No identifier may be declared more than once in the same block • Same identifier may be declared in different blocks, even if they are nested • For every applied occurrence of an identifier I in a block B, there must be a corresponding declaration of I • Must be in B itself • Or in the block B’ immediately enclosing B • Or in B’’ immediately enclosing B’ • Etc. In smallest enclosing block that contains any declaration of I
Level Identification Attribute a (1) 1 (2) b 1 Level Identification Attribute a (1) 1 (2) b 1 (3) b 2 2 c (4) Level Identification Attribute Level Identification Attribute a (1) 1 a (1) 1 (2) b (2) 1 b 1 (3) (6) b 2 d 2 c (4) e (7) 2 2 d 3 (5) Nested Block Structure • Create new table • create command • At start of a block • enter new scope command • At end of a block • leave scope command • delete command • At declaration for identifier I, make table entry • insert command • Level number determined by number of calls to enter new scope • At applied occurrence of identifier I, retrieve information from table using highest level for I • lookup command Let (1) var a: Integer; (2) var b: Boolean In begin …; let (3) var b: Integer; (4) var c: Boolean In begin …; let (5) var d: Integer; In …; … end; … let (6) var d: Boolean; (7) Var e: Integer in …; … end
Attributes Examples • Kind • constant • variable • procedure • function • type • Type • boolean • character • integer • record • array
Attributes • Information to be extracted from declaration • Constant, variable, procedure, function, type • Procedure or function declaration includes a list of formal parameters that may be a constant, variable, procedural, or functional parameter • Language provides whole families of record and array types • How to manage attribute information • Extract type information from declarations and store in information table • Could be complex for a realistic programming language • Could require tedious programming • Use the AST • Pointers in information table pointing to location in AST with that identifier
Level Level Identification Attribute Identification Attribute a 1 a 1 b 1 b 1 d 2 e Attributes Program LetCommand SequentialDeclaration SequentialCommand (2) (1) . . . VarDeclaration VarDeclaration SequentialCommand Ident. bool LetCommand int . . . Ident. a b SequentialDeclaration . . . (6) (7) VarDeclaration VarDeclaration int Ident. bool Ident. e d 2
Standard Environment • Predefined constants, variables, types, procedures, and functions • These are loaded into the identification table • Scope rules for standard environment • Scope enclosing the entire program • Level 0 • Same scope level as global declarations • Example is C
Semantic Analyzer Identification Type checking Structure of a Compiler Lexical Analyzer Source code Symbol Table tokens Semantic Analyzer Parser parse tree Intermediate Code Generation intermediate representation Optimization intermediate representation Assembly Code Generation Assembly code
Type Checking • Second task of contextual analyzer is to ensure that the source program contains no type errors • Once applied occurrence of an identifier has been identified, the contextual analyzer will check that the identifier is used in a way consistent with its declaration
Type Checking • Statically –typed language can detect any type errors without actually running the program • For every expression E in the language, the compiler can infer either that E has some type T or that E is ill-typed • If E does have type T, then E will always yield a value of type T • If a value of type T’ is expected, then compiler checks that T’ is equivalent to T
Type Checking • Infers the type of each expression bottom-up • Starting with literals and identifiers, and working up through larger and larger subexpressions • Literal: The type of a literal is immediately known • Identifier: The type of an applied occurrence of identifier I is obtained from the corresponding declaration of I • Unary operator application: • Consider “O E” where O is a unary operator of type T1 T2 • Type checker ensures that E’s type is equivalent to T1 • Infers that type of “O E” is T2. • Otherwise a type error • Binary operator application: • Consider “E1 O E2” where O is binary operator of type T1 X T2 T3 • E1’s type is equivalent to T1 • E2’s type is equivalent to T2 • ‘E1 O E2‘ is of type T3 • Otherwise type error
Type Checking • Type of a nontrivial expression is inferred from the types of its sub-expressions, using the appropriate type rules • Must be able to test if two given types T and T’ are equivalent
SimpelVname ConstDeclaration Ident. Expr. Ident. x :T x . . . ConstDeclaration SimpelVname :T Ident. Expr. Ident. :T x x . . . Type Checking – Constant or Variable Identifier
SimpelVname Ident. x Type Checking – Variable Declaration VarDeclaration Ident. T x VarDeclaration SimpelVname :T Ident. Ident. T x x
BinaryExpression Ident. Expr. Op. :int :int < . . . . . . Type Checking – Binary Operator BinaryExpression :bool Ident. Expr. Op. :int :int < . . . . . . < is of type int X int bool
Type Checking • Each applied occurrence of an identifier must be identified before type checking can proceed • + is of type int X int int • * is of type float X flaot float
Type Checking • Different class of phrase to be checked • Checking command C will determine whether C is well-formed or not • Checking of expression E will determine whether E is well-formed, and infer the type of E • Checking declaration D will determine whether D is well-formed, and make entries in the identification table for the identifier declared in D. • Checking assignment command V := E • Checking V to determine its type and ensure that it is a variable • Checking E to determine its type • Testing whether the two types are compatible • Checking a block command let D in C • Opening an inner scope • Checking D • Checking C • Closing the inner scope • The visitor methods in triangle does the contextual analysis
Type Checking in Triangle -- while public Object visitWhileCommand(WhileCommand ast, Object o) { TypeDenoter eType = (TypeDenoter) ast.E.visit(this, null); if (! eType.equals(StdEnvironment.booleanType)) reporter.reportError("Boolean expression expected here", "", ast.E.position); ast.C.visit(this, null); return null; }