Compiler Construction

Compiler Construction 강의 8

LR(k) ≡ LR(1) Context-Free Languages Context-free languages Deterministic languages (LR(k)) LL(k) languages Simple precedence languages Operator precedence languages LL(1) languages The inclusion hierarchy for context-free languages

Context-Free Grammars Context-free grammars Floyd-Evans Parsable Unambiguous CFGs Operator Precedence LR(k) LL(k) • Operator precedence includes some ambiguous grammars • LL(1) is a subset of SLR(1) LR(1) LALR(1) SLR(1) LL(1) The inclusion hierarchy for context-free grammars LR(0)

Beyond Syntax • There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0], h, i, j, k; char *p; fie(h,i,“ab”,j, k); k = f * i + j; h = g[17]; printf(“<%s,%s>.\n”, p,q); p = 10; } What is wrong with this program? (let me count the ways …)

Semantics • There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0], h, i, j, k; char *p; fie(h,i,“ab”,j, k); k = f * i + j; h = g[17]; printf(“<%s,%s>.\n”, p,q); p = 10; } What is wrong with this program? (let me count the ways …) • declared g[0], used g[17] • wrong number of args to fie() •“ab” is not an int • wrong dimension on use of f • undeclared variable q • 10 is not a character string All of these are “deeper than syntax”

Syntax-directed translation • In syntax-directed translation, we attach ATTRIBUTES to grammar symbols. • The values of the attributes are computed by SEMANTIC RULES associated with grammar productions. • Conceptually, we have the following flow: • In practice, however, we do everything in a single pass.

Syntax-directed translation • There are two ways to represent the semantic rules we associate with grammar symbols. • SYNTAX-DIRECTED DEFINITIONS (SDDs) do not specify the order in which semantic actions should be executed • TRANSLATION SCHEMES explicitly specify the ordering of the semantic actions. • SDDs are higher level; translation schemes are closer to • an implementation

Syntax-Directed Definitions

Syntax-directed definitions • The SYNTAX-DIRECTED DEFINITION (SDD) is a generalization of the context-free grammar. • Each grammar symbol in the CFG is given a set of ATTRIBUTES. • Attributes can be any type (string, number, memory loc, etc) • SYNTHESIZED attributes are computed from the values of the CHILDREN of a node in the parse tree. • INHERITED attributes are computed from the attributes of the parents and/or siblings of a node in the parse tree.

Attribute dependencies • Given a SDD, we can describe the dependencies between the attributes with a DEPENDENCY GRAPH. • A parse tree annotated with attributes is called an ANNOTATED parse tree. • Computing the attributes of the nodes in a parse tree is called ANNOTATING or DECORATING the tree.

Form of a syntax-directed definition • In a SDD, each grammar production A -> α has associated with it semantic rules • b := f( c1, c2, …, ck ) • where f() is a function, and either • b is a synthesized attribute of A, and c1, c2, …, are attributes of the grammar symbols of α, or • b is an inherited attribute of one of the symbols on the RHS, and c1, c2, … are attributes of the grammar symbols of α • in either case, we say b DEPENDS on c1, c2, …, ck.

Semantic rules • Usually we actually write the semantic rules with expressions instead of functions. • If the rule has a SIDE EFFECT, e.g. updating the symbol table, we write the rule as a procedure call. • When a SDD has no side effects, we call it an ATTRIBUTE GRAMMAR (Programming Languages course에서 설명).

Synthesized attributes • Synthesized attributes depend only on the attributes of • children. They are the most common attribute type. • If a SDD has synthesized attributes ONLY, it is called a S-ATTRIBUTED DEFINITION. • S-attributed definitions are convenient since theattributes can be calculated in a bottom-up traversal of the parse tree.

Example SDD: desk calculator • ProductionSemantic Rule • L -> E newline print( E.val ) • E -> E1 + T E.val := E1.val + T.val • E -> T E.val := T.val • T -> T1 * F T.val := T1.val x F.val • T -> F T.val := F.val • F -> ( E ) F.val := E.val • F -> number F.val := number.lexval • Notice the similarity to the yacc spec from last lecture.

Calculating synthesized attributes Input string: 3*5+4 newline Annotated tree:

Inherited attributes • An inherited attribute is defined in terms of theattributes of the node’s parents and/or siblings. • Inherited attributes are often used in compilers for passing contextual information forward, for example, the type keyword in a variable declaration statement.

Example SDD with an inherited attribute • Suppose we want to describe decls like “real x, y, z” • Production Semantic Rules • D -> T L L.in := T.type • T -> int T.type := integer • T -> real T.type := real • L -> L1, id L1.in := L.in • addtype( id.entry, L.in ) • L -> id addtype( id.entry, L.in ) • L.in is inherited since it depends on a sibling or parent. addtype() is just a procedure that sets the type field in the symbol table.

Annotated parse tree for real id1, id2, id3 What are the dependencies?

Dependency Graphs

Dependency graphs • If an attribute b depends on attribute c, then attribute b has to be evaluated AFTER c. • DEPENDENCY GRAPHS visualize these requirements. • Each attribute is a node • We add edges from the node for attribute c to the node for attribute b, if b depends on c. • For procedure calls, we introduce a dummy synthesized attribute that depends on the parameters of the procedure calls.

Dependency graph example • ProductionSemantic Rule • E -> E1 + E2 E.val := E1.val + E2.val • Wherever this rule appears in the parse, tree we draw:

Example dependency graph

Finding a valid evaluation order • A TOPOLOGICAL SORT of a directed acyclic graph orders the nodes so that for any nodes a and b such that a -> b, a appears BEFORE b in the ordering. • There are many possible topological orderings for a DAG. • Each of the possible orderings gives a valid order for evaluation of the semantic rules.

Example dependency graph

Syntax Trees

Application: syntax tree construction • One thing SDDs are useful for is construction of SYNTAX TREES. • Recall from Lecture 1 that a syntax tree is a condensed form of parse tree. • Syntax trees are useful for representing programming language constructs like expressions and statements. • They help compiler design by decoupling parsing from translation.

Syntax trees • Leaf nodes for operators and keywords are removed. • Internal nodes corresponding to uninformative non-terminals are replaced by the more meaningful operators.

SDD for syntax tree construction • We need some functions to help us build the syntax tree: • mknode(op,left,right) constructs an operator node with label op, and two children, left and right • mkleaf(id,entry) constructs a leaf node with label id and a pointer to a symbol table entry • mkleaf(num,val) constructs a leaf node with label num and the token’s numeric value val • Use these functions to build a syntax tree for a-4+c: • P1 := mkleaf( id, st_entry_for_a ) • P2 := …

SDD for syntax tree construction • ProductionSemantic Rules • E -> E1 + T E.nptr := mknode( ‘+’, E1.nptr,T.nptr) • E -> E1 - T E.nptr := mknode( ‘-’, E1.nptr,T.nptr) • E -> T E.nptr := T.nptr • T -> ( E ) T.nptr := E.nptr • T -> id T.nptr := mkleaf( id, id.entry ) • T -> num T.nptr := mkleaf( num, num.val ) • Note that this is a S-attributed definition. • Try to derive the annotated parse tree for a-4+c.

Evaluating SDDs Bottom-up

Bottom-up evaluation of S-attributed defn • How can we build a translator for a given SDD? • For S-attributed definitions, it’s pretty easy! • A bottom-up shift-reduce parser can evaluate the (synthesized) attributes as the input is parsed. • We store the computed attributes with the grammar symbols and states on the stack. • When a reduction is made, we calculate the values of any synthesized attributes using the already-computed attributes from the stack.

Bottom-up evaluation of S-attributed defns • In the scheme, our parser’s stack now stores grammar symbols AND attribute values. • For every production A -> XYZ with semantic rule A.a := f( X.x, Y.y, Z.z ), before XYZ is reduced to A, we should already have X.x Y.y and Z.z on the stack.

Desk calculator example • If attribute values are placed on the stack as described, it is now easy to implement the semantic rules for the desk calculator. • ProductionSemantic RuleCode • L -> E newline print( E.val ) print val[top-1] • E -> E1 + T E.val := E1.val + T.val val[newtop] = val[top-2]+val[top] • E -> T E.val := T.val /*newtop==top, so nothing to do*/ • T -> T1 * F T.val := T1.val x F.val val[newtop] = val[top-2]+val[top] • T -> F T.val := F.val /*newtop==top, so nothing to do*/ • F -> ( E ) F.val := E.val val[newtop] = val[top-1] • F -> number F.val := number.lexval /*newtop==top, so nothing to do*/

Desk calculator example • For input 3 * 5 + 4 newline, what happens? • Assume when a terminal’s attribute is shifted when it is. • InputStatesValuesAction • 3*5+4n shift • *5+4n number 3 reduce F->number • reduce T->F • shift • shift • reduce F->number • reduce T->T*F • reduce E->T • shift • shift • reduce F->number • reduce T->F • reduce E->E+T • shift • reduce E->En

L-attributed definitions • S-attributed definitions only allow synthesized attributes. • We saw earlier that inherited attributes are useful. • But we prefer definitions that can be evaluated in one pass. • L-ATTRIBUTED definitions are the set of SDDs whose attributes can be evaluated in a DEPTH-FIRST traversal of the parse tree.

Depth-first traversal • algorithm dfvisit( node n ) { • for each child m of n, in left-to-right order, do { • evaluate the inherited attributes of m • dfvisit( m ) • } • evaluate the synthesized attributes of n • }

L-attributed definitions • If a definition can be evaluated by dfvisit() we say it is L-attributed. • Another way of putting it: a SDD is L-attributed if each INHERITED attribute of Xi on the RHS of a production A -> X1 X2… Xn depends ONLY on: • The attributes of X1, …, Xi-1 (to the LEFT of Xi in the production) • The INHERITED attributes of A. • Since S-attributed definitions have no inherited attributes, they are necessarily L-attributed.

L-attributed definitions • Is the following SDD L-attributed? • ProductionSemantic Rules • A -> L M L.i := l(A.i) • M.i := m(L.s) • A.s := f(M.s) • A -> Q R R.i := r(A.i) • Q.i := q(R.s) • A.s := f(Q.s)

Translation Schemes

Translation schemes • Translation schemes are another way to describe syntax-directed translation. • Translation schemes are closer to a real implementation because the specify when, during the parse, attributes should be computed. • Example, for conversion of INFIX expressions to POSTFIX: • E -> T R • R -> addop T { print ( addop.lexeme ) } | ε • T -> num { print( num.val ) } • This translation scheme will turn 9-5+2 into 95-2+

Turning a SDD into a translation scheme • For a translation scheme to work, it must be the case that an attribute is computed BEFORE it is used. • If the SDD is S-attributed, it is easy to create the translation scheme implementing it: • ProductionSemantic Rule • T -> T1 * F T.val := T1.val x F.val • Translation scheme: • T -> T1 * F { T.val = T1.val * F.val } • That is, we just turn the semantic rule into an action and add at the far right hand side. This DOES NOT WORK for inherited attribs!

Turning a SDD into a translation scheme • With inherited attributes, the translation scheme designer needs to follow three rules: • An inherited attribute for a symbol on the RHS MUST be computed in an action BEFORE the occurrence of the symbol. • An action MUST NOT refer to the synthesized attribute of a symbol to the right of the action. • A synthesized attribute for the LHS nonterminal can ONLY be computed in an action FOLLOWING the symbols for all the attributes it references.

Example • This translation scheme does NOT follow the rules: • S -> A1 A2 { A1.in = 1; A2.in = 2 } • A -> a { print( A.in ) } • If we traverse the parse tree depth first, A1.in has not been set when referred to in the action print( A.in ) • S -> { A1.in = 1 } A1 { A2.in = 2 } A2 • A -> a { print( A.in ) }

Bottom-up evaluation of inherited attributes • The first step is to convert the SDD to a valid translation scheme. • Then a few “tricks” have to be applied to the translation scheme. • It is possible, with the right tricks, to do one-pass bottom-up attribute evaluation for ALL LL(1) grammars and MOST LR(1) grammars, if the SDD is L-attributed. • This means when adding semantic actions to your yacc specifications, you might run into trouble. See section 5.6 of the text!

Beyond Syntax • These questions are part of context-sensitive analysis • Answers depend on values, not parts of speech • Questions & answers involve non-local information • Answers may involve computation • How can we answer these questions? • Use formal methods • Context-sensitive grammars? • Attribute grammars? (attributed grammars?) • Use ad-hoc techniques • Symbol tables • Ad-hoc code (action routines) • In scanning & parsing, formalism won; different story here.

Beyond Syntax • Telling the story • The attribute grammar formalism is important • Succinctly makes many points clear • Sets the stage for actual, ad-hoc practice • The problems with attribute grammars motivate practice • Non-local computation • Need for centralized information • Some folks in the community still argue for attribute grammars • We will cover attribute grammars, then move on to ad-hoc ideas

Attribute Grammars • What is an attribute grammar? • A context-free grammar augmented with a set of rules • Each symbol in the derivation has a set of values, or attributes • The rules specify how to compute a value for each attribute Example grammar Number → Sign List Sign → + | – List → List Bit | Bit Bit → 0 | 1 This grammar describes signed binary numbers We would like to augment it with rules that compute the decimal value of each valid input string

Number Sign List List Bit – List Bit 1 Bit 0 1 Examples Number Sign List Bit – 1

Attribute Grammars • Add rules to compute the decimal value of a signed binary number

Compiler Construction