460 likes | 768 Views
6. Semantic Analysis. Semantic Analysis Phase Purpose : compute additional information needed for compilation that is beyond the capabilities of Context-Free Grammars and Standard Parsing Algorithms
E N D
Semantic Analysis Phase • Purpose: compute additional information needed for compilation that is beyond the capabilities of Context-Free Grammars and Standard Parsing Algorithms • Static semantic analysis: Take place prior to execution (Such as building a symbol table performing type inference and type checking) • Classification • Analysis of a program required by the rules of the programming language to establish its correctness and to guarantee proper execution • Analysis performed by a compiler to enhance the efficiency of execution of the translated program
Description of the static semantic analysis • Attribute grammar • identify attributes of language entities that must be computed and to write attribute equations or semantic rules that express how the computation of such attributes is related to the grammar rules of the language. • Which is most useful for languages that obey the principle of Syntax-Directed Semantics • Abstract syntaxas represented by an abstract syntax tree
Contents 6.1 Attributes and Attribute Grammars 6.3 The Symbol Table 6.4 Data Types and Type Checking
Attributes • Any property of a programming language construct such as • The data type of a variable • The value of an expression • The location of a variable in memory • The object code of a procedure • The number of significant digits in a number • Binding of the attribute • The process of computing an attribute and associating its computed value with the language construct • Binding time • The time during the compilation/execution process when the binding of an attribute occurs • Based on the difference of the binding time, attributes is divided into Static attributes (be bound prior to execution) and Dynamic attributes (be bound during execution)
Example: The binding time and significance during compilation of the attributes. • Attribute computations are extremely varied • Type checker • A Type Checker is a semantic analyzer that computes the data type attribute of all language entities for which data types are defined and verifies that these types conform to the type rules of the language. • In a language like C or Pascal, is an important part of semantic analysis; . • While in a language like LISP , data types are dynamic, LISP compiler must generate code to compute types and perform type checking during program execution. • The values of expressions • Usually dynamic and the be computed during execution; • But sometime can also be evaluated during compilation (constant folding).
In syntax-directed semantics, attributes are associated directly with the grammar symbols of the language. • X.ameans the value of ‘a’ associated to ‘X’ • X is a grammar symbol and a is an attribute associated to X • Syntax-directed semantics: • Attributes are associated directly with the grammar symbols of the language. • Given a collection of attributes a1, …, ak, it implies that for each grammar rule X0X1X2…Xn (X0 is a nonterminal), • The values of the attributes Xi.aj of each grammar symbol Xi are related to the values of the attributes of the other symbols in the rule.
An attribute grammar for attributes a1, a2, …,akis the collection of all attribute equations or semantic rules of the following form, • for all the grammar rules of the language. Xi.aj = fij(X0.a1,…,X0.ak, …,X1.al,….. X1.ak…, Xn-1.a1, …Xn.ak) • Where fij is a mathematical function of its arguments • Typically, attribute grammars are written in tabular form as follows: Grammar Rule Semantic Rules Rule 1 Associated attribute equations ... Rule n Associated attribute equation
Example 6.1 consider the following simple grammar for unsigned numbers: Number number digit | digit Digit 0|1|2|3|4|5|6|7|8|9 The most significant attribute: numeric value (write as val), and the responding attribute grammar is as follows:
The parse tree showing attribute computations for the number 345 is given as follows
Example 6.2 consider the following grammar for simple integer arithmetic expressions: Expexp + term | exp-term | term Term term*factor | factor Factor (exp)| number The principal attribute of an exp (or term or factor) is its numeric value (write as val) and the attribute equations for the val attribute are given as follows
Given the expression (34-3)*42 , the computations implied by this attribute grammar by attaching equations to nodes in a parse tree is as follows
Example 6.3 consider the following simple grammar of variable declarations in a C-like syntax: Decl type var-list Typeint | float Var-listid,var-list |id Define a data type attribute for the variables given by the identifiers in a declaration and write equations expressing how the data type attribute is related to the type of the declaration as follows: (We use the name dtype to distinguish the attribute from the nonterminal type)
Note thatthere is no equation involving the dtype of the nonterminal decl. It is not necessary for the value of an attribute to be specified for all grammar symbols Parse tree for the string float x,y showing the dtype attribute as specified by the attribute grammar above is as follows:
Example 6.4 consider the following grammar, where numbers may be octal or decimal, suppose this is indicated by a one-character suffix o(for octal) or d(for decimal): Based-num num basechar Basechar o|d Num num digit | digit Digit 0|1|2|3|4|5|6|7|8|9 In this case num and digit require a new attribute base, which is used to compute the val attribute. The attribute grammar for base and val is given as follows.
The symbol table is the major inherited attribute in a compiler and, after syntax tree, forms the major data structure. • The operations of the Symbol table are insert: is used to store the information provided by name declarations when processing these declarations. lookup: is needed to retrieve the information associated to a name when that name is used to associate code delete: is needed to remove the information provided by a declaration when that declaration no longer applies
The structure of the symbol table • The symbol table in a compiler is a typical dictionary data structure include • linear lists :constant time insert operations, can be chosen if speed is not a concern • various search tree structures( binary search trees, AVL trees, B trees):do not provide best case efficiency and complexity of delete operations • hash tables: best choice because all three operations can be performed at constant time.
The best scheme for compiler construction is to open addressing, called separate chaining. • In this method each bucket is actually a linear list, and collisions are resolved by inserting the new item into the bucket list.
separate chaining. • The size of the linear list ranges from few hundred to over a thousand • Size of the bucket array should be chosen to be a prime number. • Hash function for a symbol table • Convert the character string in to an integer between 0…..size-1 • Convert each character in to a non negative integer • C language uses ASCII values • Pascal uses ord function • Combine all the integers in to a single integer • Programmers choice • Resulting integer should be scaled to the range of 0……size-1
separate chaining. • Mod function is used to scale a number to fall in to the range of 0…size-1. • Combine all the integers in to a single integer • One simple method is to consider first few characters or first middle and last characters and ignore the rest • Inadequate: as programmers can use variables like temp1,temp2… and will cause frequent collisions • Chosen method should reduce the collisions • One good method is to multiplying factor(some constant number ) • If ci is the numeric value of the (i+1)th character and hi is the partial is the partial hash value computed at the (i)th step then • h(i+1) = hi+c • This can be generalized to • H=( n-1c1+ n-2c2+……. cn-1+cn)mod size
separate chaining • The choice of the constant in this formula has significant effect on the out come • A reasonable choice for the constant is a number which is power of 2 (like 2,4,8,16,32,64,128…)
Declarations. • There is no specific standard to create a symbol table for all compilers. • Symbol table will vary from compiler to compiler and it depends on • How each compiler Calls insert and delete operations • How the variables are declared in each compiler. • How long the compiler wants the symbol table to exist • The behavior of the symbol table heavily depends on the properties of declarations of the language being translated. • There are 4 basic kinds of declarations • Constant , • Constint SIZE=199 • type • Structures and unions • variable • Variable declarations :inta,b[100]
Declarations. • procedure/function • Are constant declaration of procedure/function type (similar to other variables) • These variables are explicit in nature (needs to be declared before use) • In few compliers they are implicit in nature (need not be declared before use) • Fortran,basic • We can create one single symbol table for all above 4 declarations • Constant declarations(value binding) associate values to names. • Saves them in the symbol table and their value will never change in symbol table • In the execution phase the compiler will change these constant variables names to values by picking them from the symbol table
Declarations. • We can create one single symbol table for all above 4 declarations • Type declarations, Variable declaration and function/procedure declarations also will be saved in the symbol table with minute difference between each other
Scope rules and Block structure: • In the above program there are five blocks • First block : Variables inti,j and the function f has scope for entire program(global). • Second block: the variable size has one scope • Third block: variables i and temp has different scope
Scope rules and Block structure: • In the above program there are five blocks • Third :variables i and temp has different scope • Variable i is of type char but in global scope it is of type int • Until we come out of this 3rd block (function f) the type of i will be char. When we come out of 3rd block again variable i will be of type int. • Compiler is giving more priority to local variables. • When we come out of the block the variable i of type char will be deleted and again variable i of type int will come in to picture.
Scope rules and Block structure: • In the above program there are five blocks • Third :variables i and temp has different scope • When processing this 3rd block the symbol table will look like
Scope rules and Block structure: • Variable i is saved twice with both types • Variable i of type char is saved first, so when try to look up for variable i, it will return the char variable only (but not int variable) • When we come out of the block, i (char) and temp (char) will be deleted from the symbol table
Scope rules and Block structure: • In the above program there are five blocks • Fourth block :variables j of type double will be added to the table and the symbol table looks like
Scope rules and Block structure: • In the above program there are five blocks • At the end of the function f block all the variables declared in it are deleted and the symbol table will be
Scope rules and Block structure: • One more type of scope declaration is using different symbol tables for each block (as seen in the previous slides) and the practical implementation will look like
Symbol table example class Foo { int value; int test() { int b = 3; return value + b; } void setValue(int c) { value = c; { int d = c; c = c + d; value = c; } } } class Bar { int value; void setValue(int c) { value = c; } } scope of b scope of value scope of c scope of d block1 scope of value scope of c
Symbol table example class Foo { int value; int test() { int b = 3; return value + b; } void setValue(int c) { value = c; { int d = c; c = c + d; value = c; } } } class Bar { int value; void setValue(int c) { value = c; } } (Foo) (setValue) (test) (block1)
Symbol table example cont. … (Foo) (test) (setValue) (block1)
Checking scope rules (Foo) (test) (setValue) (block1) lookup(value) void setValue(int c) { value = c; { int d = c; c = c + d; value = c; } }
Catching semantic errors Error !undefined symbol (Foo) (test) (setValue) (block1) lookup(myValue) void setValue(int c) { value = c; { int d = c; c = c + d; myValue = c; } }
Principal Task of Compiler Type inference The computation and maintenance of information on data types Type checking • Type checking uses logical rules to reason about the behavior of a program at run time. Specifically, it ensures that the types of the operands match the type expected by an operator. For example, the && operator in Java expects its two operands to be booleans; the result is also of type boolean. The use of the information to ensure that each part of the program makes sense under the type rules of the language. These two tasks are related, performed together, and referred as type checking.
Data Type • Data type information may be either static or dynamic, or a combination of two. Ex: In LISP type inference and type checking will be done during the execution(dynamic) Ex: In Pascal, C and Ada type information is primarily static, and is used as the principal mechanism for checking the correctness of the program before execution(static)