1 / 36

CS 3304 Comparative Languages

CS 3304 Comparative Languages. Lecture 6: Semantic Analysis 2 February 2012. Introduction. Context-free grammar can become very complex if trying to do too much.

albertd
Download Presentation

CS 3304 Comparative Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 3304Comparative Languages • Lecture 6:Semantic Analysis • 2 February 2012

  2. Introduction • Context-free grammar can become very complex if trying to do too much. • A rule that requires the compiler to compare things that are separated by long distances, or to count things that are not properly nested, ends up being a matter of semantics, • Semantics rules: • Static: enforced by the compiler at compile time. • Dynamic: enforced by the compiler generate code at runtime. • Following parsing, the next two phases of the “typical” compiler are: • Semantic analysis. • (Intermediate) code generation. • Described in terms of annotations (attributes) of a parse tree. • Attribute grammars provide a formal framework.

  3. Semantic Analyzer • The principal job of the semantic analyzer is to enforce static semantic rules: • Constructs a syntax tree (usually first). • Information gathered is needed by the code generator. • This interface is a boundary between the front end and the back end. • There is a considerable variety in the extent to which parsing, semantic analysis, and intermediate code generation are interleaved. • Fully separated phases: a full parse tree, a syntax tree, and semantic check. • Fully interleaved phases: no need to build both pars and syntax trees. • A common approach interleaves construction of a syntax tree with parsing (no explicit parse tree), follows with separate, sequential phases for semantic analysis and code generation.

  4. Dynamic Checks • Many compilers have an option to enable/disable code generation for dynamic checks: questionable. • A consequence of an undetected error in production use is significantly worse than when testing. • Sometimes dynamic checks are cheap: execute in instruction slots that would otherwise go unused. • Sometimes dynamic checks are expensive: pointer arithmetic in C.

  5. Assertions • An assertion is a statement that a specified condition is expected to be true when execution reaches a certain point in the code (Java): assert denominator != 0; • Many languages support assertions via standard library (C): assert(denominator != 0); • Other constructs include invariants, preconditions and post-conditions (Euclid, Eiffel): can cover a potentially large number of places where an assertion would be required. • Semantic checking requires a significant run-time overhead but with recent hardware advances, it becomes very feasible.

  6. Static Analysis • Compile-time algorithms that predict run-time behavior. • It is precise if it allows the compiler to determine whether a given program will always follow the rules: type checking. • Also useful when not precise: a combination of compile time check and code for run time checking. • Static analysis is also used for code improvement: • Alias analysis: when values can be safely cached in registers. • Escape analysis: all references to a value confined to a given context. • Subtype analysis: an OO variable is of a certain subtype. • Unsafe and speculative optimization. • Conservative and optimistic compilers. • Some languages have tighter semantic rules to avoid dynamic checking.

  7. Attribute Grammars • Both semantic analysis and (intermediate) code generation can be described in terms of annotation, or “decoration” of a parse or syntax tree. • Attribute grammars provide a formal framework for decorating such a tree. • We'll start with decoration of parse trees, then consider syntax trees.

  8. Example Grammar • Consider the following LR (bottom-up) grammar for arithmetic expressions made of constants, with precedence and associativity: says nothing about what the program means. E → E + TE → E – TE → TT → T * FT → T / FT → FF → - F F → ( F ) F → const

  9. Example Attribute Grammar SLR(1) • We can turn this into an attribute grammar as follows (similar to Figure 4.1): • E → E + TE1.val = E2.val + T.val • E → E – TE1.val = E2.val - T.val • E → TE.val = T.val • T → T * FT1.val = T2.val * F.val • T → T / FT1.val = T2.val / F.val • T → FT.val = F.val • F → - FF1.val = - F2.val • F → (E)F.val = E.val • F → constF.val = C.val

  10. Attribute Rules • The attribute grammar serves to define the semantics of the input program. • Attribute rules are best thought of as definitions, not assignments. • They are not necessarily meant to be evaluated at any particular time, or in any particular order, though they do define their left-hand side in terms of the right-hand side.

  11. Annotation • The process of evaluating attributes is called annotation, or decoration, of the parse tree (see Figure 4.2 for (1+3)*2): • When a parse tree under this grammar is fully decorated, the value of the expression will be in the val attribute of the root. • The code fragments for the rules are called semantic functions: • Strictly speaking, they should be cast as functions, e.g.:E1.val = sum (E2.val, T.val). • This is a very simple attribute grammar: • Each symbol has at most one attribute: the punctuation marks have no attributes. • These attributes are all so-called synthesized attributes: • They are calculated only from the attributes of things below them in the parse tree.

  12. Decoration of a Parse Tree

  13. Evaluating Attributes • In general, we are allowed both synthesized and inherited attributes. • Inherited attributes may depend on things above or to the side of them in the parse tree. • Tokens have only synthesized attributes, initialized by the scanner (name of an identifier, value of a constant, etc.). • Inherited attributes of the start symbol constitute run-time parameters of the compiler.

  14. Synthesized Attributes • The S-attributed grammar uses only synthesized attributes. • Its attribute flow (attribute dependence graph) is purely bottom-up. • The arguments to semantic functions in an S-attributed grammar are always attributes of symbols on the right-hand side of the current production. • The return value is always placed into an attribute of the left hand side of the production. • The intrinsic properties of tokens are synthesized attributes initialized by the scanner.

  15. Inherited Attributes • Inherited attributes: values are calculated when their symbol is on the right-hand side of the current production. • Contextual information flow into a symbols for above or from the side: provide different context. • Symbol table information is commonly passed be means of inherited attributes. • Inherited attributes of the root of the parse tree can be used to represent external environment. • Example: left-to-right associativity may create a situation where an S-attributed grammar would be cumbersome to use. By passing attribute values left-to-right in the tree, things are much simpler.

  16. Example Attribute Grammar LL(1) • E → T TTE.v = TT.v TT.st = T.v • TT1 → + T TT2TT1.v = TT2.v TT2.st = TT1.st + T.v • TT1 → - T TT2TT1.v = TT2.v TT2.st = TT1.st - T.v • TT →εTT.v = TT.st • T → F FTT.v = FT.v FT.st = F.v • FT1 → * F FT2FT1.v = FT2.v FT2.st = FT1.st * F.v • FT1 → / F FT2FT1.v = FT2.v FT2.st = FT1.st / F.v • FT →εFT.v = FT.st • F1 → - F2F1.v = - F2.v • F → ( E )F.v = E.v • F → const F.v = C.v

  17. Parse Tree for (1+3)*2

  18. Attribute Flow • Well defined attribute grammar: its rules determine a unique set of values for attributes of every possible parse tree. • Noncircular attribute grammar: it never leads to a parse tree in which there are cycles in the attribute flow graph. • Translation scheme: an algorithm that decorates parse trees by invoking the rules of an attribute grammar in an order consistent with the tree’s attribute flow. • The LL(1) attribute grammar is a good bit messier than the SLR(1) one, but it is still L-attributed (the attributes can be evaluated in a single left-to-right pass over the input): • L-attributed grammars are the most general class of attribute grammars that can be evaluated during an LL parse. • S-attributed grammars are the most general class of attribute grammars that can be evaluated during an LR parse.

  19. Parsers and Attribute Grammars • Each synthetic attribute of a LHS symbol (by definition of synthetic) depends only on attributes of its RHS symbols. • A bottom-up parser: in general paired with an S-attributed grammar. • Each inherited attribute of a RHS symbol (by definition of L-attributed) depends only on: • Inherited attributes of the LHS symbol, or • Synthetic or inherited attributes of symbols to its left in the RHS. • A top-down parser: in general paired with an L-attributed grammar. • There are certain tasks, such as generation of code for short-circuit Boolean expression evaluation, that are easiest to express with non-L-attributed attribute grammars. • Because of the potential cost of complex traversal schemes, however, most real-world compilers insist that the grammar be L-attributed.

  20. Building a Syntax Tree • If intermediate code generation is interleaved with the parsing, no need to build a syntax tree. • One-pass compiler: a compiler that interleaves semantic analysis and code generation with parsing. • Semantic analysis is easier to perform during a separate traversal of a syntax tree: • Add attribute rules to the context-free grammar. • The attributes point to nodes of a syntax tree. • Two examples: • Bottom-up attribute grammar. • Top-down attribute grammar.

  21. Bottom-Up Attribute Grammar

  22. Two Leaves: Constants 1 and 3

  23. First Internal Node: 1+3

  24. Third Leaf: Constant 2

  25. Second Internal Node: (1+3)*2

  26. Top-Down Attribute Grammar

  27. First Leaf: Constant 1

  28. Second Leaf: Constant 3

  29. Third Leaf: Constant 2

  30. ANTLR Grammar: Program.g grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;

  31. Using ANTLR • From Generate Menu select Generate Code menu item. • In gencode subdirectory three files are generated: • Program.tokens: The list of token-name, token-type assignments • ProgramLexer.java: The lexer (scanner) generated from Program.g. • ProgramParser.java: The parser generated from Program.g. • Create a tester class (with main), e.g. RunProgram.java. • Compile and run:javacRunProgram.javaProgramParser.javaProgramLexer.javajava RunProgram • Make sure that the ANTLR jar file is in your class path or included in your Java installation. • ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).

  32. Main Class: RunProgram.java import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ) ); parser.program(); } }

  33. Evaluate: ProgramEvaluation.g I grammar ProgramEvaluation; @header { import java.util.HashMap; } @members { HashMapsymbolTable = new HashMap(); } program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;

  34. Evaluate: ProgramEvaluation.g II expressionreturns [intvalue] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpressionreturns [intvalue] : e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;} )* ;

  35. Evaluate: ProgramEvaluation.g II atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

  36. Summary • The enforcement of static semantic rules and the generation of intermediate code can be cast in terms of annotation of a parse tree. • An attribute grammar associates attributes with each symbol in a context-free grammar and attribute rules which each production. • Attributes are either synthesized(bottom-up) or inherited (top-down). • The syntax tree is a tree representation of the abstract syntactic structure of source code.

More Related