CS 3304 Comparative Languages

CS 3304Comparative Languages • Lecture 7:Syntax Tree • 7 February 2012

Introduction • We can tie this discussion back into the earlier issue of separated phases versus on-the-fly semantic analysis and/or code generation. • If semantic analysis and/or code generation are interleaved with parsing, then the translation scheme we use to evaluate attributes must be L-attributed. • If we break semantic analysis and code generation out into separate phase(s), then the code that builds the parse/syntax tree must still use a left-to-right (L-attributed) translation scheme. • However, the later phases are free to use a fancier translation scheme if they want.

Translation Scheme • There are automatic tools that construct a semantic analyzer (attribute evaluator) for a given attribute grammar. In other words, they generate translation schemes for context-free grammars or tree grammars (which describe the possible structure of a syntax tree): • These tools are heavily used in syntax-based editors and incremental compilers. • Most ordinary compilers, however, use ad-hoc techniques. • Most production compilers use an ad hoc, handwritten translation scheme: • Interleave parsing with at least the initial construction of a syntax tree. • Possibly all of semantic analysis and intermediate code generation. • Since the attributes of each production are evaluated as the production is parsed, there is no need for the full parse tree.

Action Routines I • An ad-hoc translation scheme that is interleaved with parsing takes the form of a set of action routines: • An action routine is a semantic function that we tell the compiler to execute at a particular point in the parse. • If semantic analysis and code generation are interleaved with parsing, then action routines can be used to perform semantic checks and generate code. • LL parser generator: an action routine can appear anywhere within a right-hand side. • Implementation: when the parser predicts a production, the parser pushes all of the right hand side onto the stack.

Action Routines II • If semantic analysis and code generation are broken out as separate phases, then action routines can be used to build a syntax tree: • A parse tree could be built completely automatically. • We wouldn't need action routines for that purpose. • Later compilation phases can then consist of ad hoc tree traversal(s), or can use an automatic tool to generate a translation scheme. • The PL/0 compiler uses ad-hoc traversals that are almost (but not quite) left-to-right.

Action Routines Example (Figure 4.9) • For our LL(1) attribute grammar (Figure 4.6), we could put in explicit action routines:

Constructing Syntax Tree • Productions 2-5: term_tail procedure for a syntax tree: • The parameter is a pointer to the syntax tree fragment in TT1. • Determines the upcoming symbol input. • Calls add_op to parse that symbol • Calls term to parse the attribute grammar’s T. • Calls make_bin_op to create a new tree node. • Passes that node to to term_tail that parses TT2. • Returns the result. procedure term_tail(lhs : tree_node_ptr) case input_token of +, - : op : string := add_op return term_tail(make_bin_op(op, lhs, term)) --term is a recursive call with no arguments ), id, read, write, $$ : -- epsilon production return lhs otherwise parse_error

Bottom-Up Evaluation • LR parser does not in general know what production it is in until it has seen all or most of the yield: action routines cannot be embedded at arbitrary places in a right hand side. • Action routines allowed only after the point at which the production is identified unambiguously (trailing part of the right-hand side). • The ambiguous part is the left corner. • If the attribute flow is strictly bottom up then the execution at the end of the right-hand side is all that is needed. • If the action routines are doing a lot of semantic analysis, they need some contextual information. • That requires access to inherited attributes or to information outside the current production.

Space Management for Attributes • If there is a parse tree, the attributes can be stored in nodes. • For a bottom-up parser with an S-attributed grammar, maintain an attribute stack mirroring the parse stack: • Next to every state number is an attribute record for the symbol shifted when entering the state. • Entries are pushed and popped automatically. • For a top-down parser with an L-attributed grammar: • Automatic: an attribute stack that does not mirror the parse stack. • Short-cutting copy rules: action routines allocate and deallocate space for attributes explicitly. • Contextual information: • Symbol table that always represents the current referencing environment.

Example: Calculator Language • A calculator language with types and declarations. • Declarations are intermixed with statements. • Differentiate between integer and real constants. • Explicit conversion between integer and real operands is required. • Every identifier should be declared before use. • The types should not be mixed in computations. • Constructing the syntax tree: adding semantic functions or action routines to the context free grammar for the calculator language.

Context Free Grammar

Example: Syntax Tree • Syntax tree for a simple program to print an average of an integer and a real (Figure 4.12).

Tree Grammar • Represents the possible structure of syntax trees. • A tree grammar production represents possible relationship between a parent and its children in the tree. • No need for parsing. • Tree grammars provide a framework for the decoration of syntax trees. • Can be used to perform static semantic checking.

Example: Tree Grammar • Tree grammar representing structure of syntax tree in Figure 4.12 • The notation A : B on the left hand side of a production mans that A is one variant of B, may appear anywhere a B is expected on a right hand side.

Example: Complete Tree Grammar • A sample from a complete tree grammar representing structure of syntax tree in Figure 4.12. • Constructed using node classes, variants, and attributes (inherited and synthesized), Figure 4.13. • Classes: program, item, and expr. • item variants: int_decl, real_decl, read, write, :=, and null.

Using Tree Grammar • The program node at the root of the syntax tree contains a list (synthesized attribute) of all static semantic errors. • Each item or exprnode has an inherited attribute symtab that contains a list (with types) of all identifiers declared to the left in the tree. • Each itemnode has: • An inherited attribute errors_in that lists all static semantic errors found to its left in the tree. • A synthesized attribute errors_out to propagate the final error list back to the root. • Each exprnode has: • A synthesized attribute that indicates its type. • A synthesized attribute that contains a list of any semantic errors found inside.

Decorating Syntax Tree (Figure 4.15) • Symbol table information flows along the chain of items and down into expr trees. • Type information is synthesized at id:expr leaves (symbol table). • The information then propagates upward within an expression tree and is used to type-check operators and assignments. • Error messages flow along the chains of items via the error_in attributes. • Error messages flow back to root via the error_out attributes. • Messages also flow up out of expr trees. • Whenever a type check is performed, the type attribute may be used to help create a new message and append to a list.

Summary • Most compilers rely on action routines that evaluate attribute rules at specific points in a parse. • The automatic approach is easier to maintain; the ad hoc approach is slightly faster and more flexible. • In a one-pass compiler semantic functions or action actions are responsible for all of semantic analysis and code generation, i.e. the build a syntax tree.

CS 3304 Comparative Languages