Chapter 2

Chapter 2 Chang Chi-Chung 2008.03 rev.1

A Simple Syntax-Directed Translator • This chapter contains introductory material to Chapters 3 to 8 • To create a syntax-directed translator that maps infix arithmetic expressions into postfix expressions. • Building a simple compiler involves: • Defining the syntax of a programming language • Develop a source code parser: for our compiler we will use predictive parsing • Implementing syntax directed translation to generate intermediate code

A Code Fragment To Be Translated To extend syntax-directed translator to map code fragments into three-address code. See appendix A. { int i; int j; float[100] a; float v; float x; while (true) { do i = i + 1; while ( a[i] < v ); do j = j – 1; while ( a[j] > v ); if ( i>= j ) break; x = a[i]; a[i] = a[j]; a[j] = x; } } 1: i = i + 1 2: t1 = a [ i ] 3: if t1 < v goto 1 4: j = j -1 5: t2 = a [ j ] 6: if t2 > v goto 4 7: ifFalse i >= j goto 9 8: goto 14 9: x = a [ i ] 10: t3 = a [ j ] 11: a [ i ] = t3 12: a [ j ] = x 13: goto 1 14:

A Model of a Compiler Front End Lexical analyzer Parser Intermediate Code Generator Source program Tokenstream Syntaxtree Three-address code CharacterStream Symbol Table

Abstract syntax trees Tree-Address instructions do-while body > [ ] v assign i + a i i 1 Two Forms of Intermediate Code 1: i = i + 1 2: t1 = a [ i ] 3: if t1 < v goto 1

Syntax Definition • Using Context-free grammar (CFG) • BNF: Backus-Naur Form • Context-free grammar has four components: • A set of tokens (terminal symbols) • A set of nonterminals • A set of productions • A designated start symbol

Example of CFG • G = <T, N, P, S> • T = { +,-,0,1,2,3,4,5,6,7,8,9} • N = { list, digit} • P = • list  list + digit • list  list – digit • list  digit • digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • S = list

Derivations • The set of all strings (sequences of tokens) generated by the CFG using derivation • Begin with the start symbol • Repeatedly replace a nonterminal symbol in the current sentential form with one of the right-hand sides of a production for that nonterminal

Example of the Derivations listlist+digit list-digit+digit digit-digit+digit 9 -digit+digit 9 - 5 +digit 9 - 5 + 2 • Production • list  list + digit • list  list – digit • list  digit • digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Leftmost derivation • replaces the leftmost nonterminal (underlined) in each step. • Rightmost derivation • replaces the rightmost nonterminal in each step.

A X Y Z Parser Trees • Given a CFG, a parse tree according to the grammar is a tree with following propertes. • The root of the tree is labeled by the start symbol • Each leaf of the tree is labeled by a terminal (=token) or  • Each interior node is labeled by a nonterminal • If A  X1 X2 … Xn is a production, then node A has immediate children X1, X2, …, Xn where Xi is a (non)terminal or  ( denotes the empty string) • Example • A XYZ

list list digit list digit digit The sequence ofleafs is called theyield of the parse tree 9 - 5 + 2 Example of the Parser Tree • Parse tree of the string 9-5+2 using grammar G

Ambiguity • Consider the following context-free grammar • This grammar is ambiguous, because more than one parse tree represents the string 9-5+2 G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string> P = string string+string | string-string | 0 | 1 | … | 9

string string string string string string string string string string 9 - 5 + 2 9 - 5 + 2 Ambiguity (Cont’d)

Associativity of Operators • Left-associative • If an operand with an operator on both sides of it, then it belongs to the operator to its left. • string a+b+c has the same meaning as (a+b)+c • Left-associative operators have left-recursive productions • left  left + term | term • Right-associative • If an operand with an operator on both sides of it, then it belongs to the operator to its right. • string a=b=c has the same meaning as a=(b=c) • Right-associative operators have right-recursive productions • right  term = right | term

right list letter right list digit letter right list digit letter digit a = b = c a + b + c Associativity of Operators (cont’d) left-associative right-associative

Precedence of Operators • String 9+5*2 has the same meaning as 9+(5*2) • * has higher precedence than + • Constructs a grammar for arithmetic expressions with precedence of operators. • left-associative : + - (expr) • left-associative：* / (term) Step 1: factordigit | ( expr ) Step 3: expr  expr + term | expr – term | term Step 2: term  term * factor | term / factor | factor Step 4: expr expr+term | expr – term | termterm  term *factor | term /factor | factorfactor digit | ( expr )

An Example: Syntax of Statements • The grammar is a subset of Java statements. • This approach prevents the build-up of semicolons after statements such as if- and while-, which end with nested substatements. stmt  id = expression ; | if ( expression ) stmt | if ( expression ) stmt else stmt | while( expression ) stmt | do stmt while ( expression ) ; | { stmts } stmts  stmts stmt| 

Syntax-Directed Translation • Syntax-Directed translation is done by attaching rules or program fragments to productions in a grammar. • Translate infix expressions into postfix notation. ( in this chapter ) • Infix: 9 – 5 + 2 • Postfix: 9 5 – 2 + • An Example • expr expr1+term • The pseudo-code of the translation translate expr1 ; translate term ; handle + ;

Syntax-Directed Translation (Cont’d) • Two concepts (approaches) related to Syntax-Directed Translation. • Synthesized Attributes • Syntax-directed definition • Build up a translation by attaching strings (semantic rules) as attributes to the nodes in the parse tree. • Translation Schemes • Syntax-directed translation • Build up a translation by program fragments which are called semantic actions and embedded within production bodies.

Syntax-directed definition • The syntax-directed definition associates • With each grammar symbol (terminals and nonterminals), a set of attributes. • With each production, a set of semantic rules for computing the values of the attributesassociated with the symbols appearing in the production. • An attribute is said to be • Synthesized • if its value at a parse-tree node is determined from attribute values at its children and at the node itself. • Inherited • if its value at a parse-tree node is determined from attribute values at the node itself, its parent, and its siblings in the parse tree.

expr.t = “95-2+” expr.t = “95-” term.t = “2” expr.t = “9” term.t = “5” term.t = “9” 9 - 5 + 2 An Example: Synthesized Attributes • An annotated parse tree • Suppose a node N in a parse tree is labeled by grammar symbol X. • The X.a is denoted the value of attribute a of X at node N.

Semantic Rules

Depth-First Traversals • Tree traversals • Breadth-First • Depth-First • Preorder: N L R • Inorder: L N R • Postorder: L R N • Depth-First Traversals: Postorder、From left to right procedure visit(node N){ for ( each child C of N, from left to right ) {visit(C); } evaluate semantic rules at node N;}

expr.t = 95-2+ expr.t = 95- term.t = 2 expr.t = 9 term.t = 5 term.t = 9 9 - 5 + 2 Example: Depth-First Traversals Note: all attributes are the synthesized type

rest { print(“+”) } + term rest Translation Schemes • A translation scheme is a CFG embedded with semantic actions • Example • rest +term{ print(“+”) }rest Embedded Semantic Action

expr expr + term { print(‘+’) } expr - term { print(‘-’) } 2 { print(‘2’) } term 5 { print(‘5’) } 9 { print(‘9’) } An Example: Translation Scheme expr expr+term { print(‘+’) }expr  expr–term { print(‘-’) }expr  termterm  0 { print(‘0’) } term  1 { print(‘1’) }…term  9 { print(‘9’) }

Parsing • The process of determining if a string of terminals (tokens) can be generated by a grammar. • Time complexity: • For any CFG there is a parser that takes at most O(n3) time to parse a string of n terminals. • Linear algorithms suffice to parse essentially all languages that arise in practice. • Two kinds of methods • Top-down: constructs a parse tree from root to leaves • Bottom-up: constructs a parse tree from leaves to root

Top-Down Parsing • Recursive descentparsing is a top-down method of syntax analysis in which a set of recursive procedures is used to process the input. • One procedure is associated with each nonterminal of a grammar. • If a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input lookahead information • Predictive parsing • A special form of recursive descent parsing • The lookahead symbol unambiguously determines the flow of control through the procedure body for each nonterminal.

stmt ( optexpr optexpr stmt for ; optexpr ; ) expr expr other ε An Example: Top-Down Parsing stmt  expr; | if ( expr) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other optexpr  | expr

void stmt() { switch ( lookahead ) { case expr: match(expr); match(‘;’); break; case if: match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break; case for: match(for); match(‘(‘); optexpr(); match(‘;’); optexpr(); match(‘;’); optexpr(); match(‘)’); stmt(); break; case other: match(other); break; default: report(“syntax error”); }} void optexpr() { if ( lookahead == expr ) match(expr);} void match(terminal t) { if ( lookahead == t ) lookahead = nextTerminal; else report(“syntax error”);} Pseudocode For a Predictive Parser stmt expr; | if ( expr) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other Use ε-Productions optexpr  | expr

for optexpr optexpr optexpr stmt ( ; ; ) lookahead Example: Predictive Parsing Parse Tree LL(1) stmt optexpr() match(‘;‘) optexpr() match(‘)‘) stmt() match(for) match(‘(‘) optexpr() match(‘;‘) Input for ( ; expr ; expr ) other

FIRST • FIRST() is the set of terminals that appear as the first symbols of one or more strings generated from  •  is Sentential Form • Example • FIRST(stmt) = { expr, if, for, other } • FIRST(expr ;) = { expr } stmt expr; | if ( expr) stmt | for ( optexpr ; optexpr ; optexpr ) stmt | other

Examples: First type simple|^ id|array [ simple ] of typesimple integer|char| num dotdot num FIRST(simple) = { integer, char, num } FIRST(^ id) = { ^ } FIRST(type) = { integer, char, num, ^, array }

Designing a Predictive Parser • A predictive parser is a program consisting of a procedure for every nonterminal. • The procedure for nonterminal A • It decides which A-production to use by examining the lookahead symbol. • Left Factor • Left Recursion • ε Production • Mimics the body of the chosen production. • Applying translation scheme • Construct a predictive parser, ignoring the actions. • Copy the actions from the translation scheme into the parser

Left Factor • Left Factor • One production for nonterminal A starts with the same symbols. • Example: stmt  if ( expr ) stmt | if ( expr ) stmt else stmt • Use Left Factoring to fix it stmt if ( expr )stmt rest rest else stmt |ε

Left Recursion • Left Recursive • A production for nonterminal A starts with a self reference. • A Aα | β • An Example: • expr expr + term | term • Rewrite the left recursive to right recursive by using the following rules. A βR R  αR |ε

A A … R R A … A R A R ε Example: Left and Right Recursive right recursive left recursive

+ - 2 expr 5 9 expr term expr term helper term 9 - 5 + 2 Abstract and Concrete Syntax

Conclusion: Parsing and Translation Scheme • Give a CFG grammar G as below: expr expr+term { print(‘+’) } expr  expr–term { print(‘-’) } expr  term term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) } • Semantic actions for translating into postfix notation.

Conclusion: Parsing and Translation Scheme • Step 1 • To elimination left-recursion • Technique A Aα | Aβ| γ into A γR R  αR |βR| ε • Use the rule to transforms G.

Conclusion: Parsing and Translation Scheme • Left-Recursion-elimination expr term rest rest +term { print(‘+’) }rest |–term { print(‘-’) }rest |ε term  0 { print(‘0’) } term  1 { print(‘1’) } … term  9 { print(‘9’) }

expr rest term 9 { print(‘9’) } term { print(‘-’) } rest - term 5 { print(‘5’) } + { print(‘+’) } rest 2 { print(‘2’) } ε An Example: Left-Recursion-elimination expr term rest rest +term { print(‘+’) } rest| –term { print(‘-’) } rest| εterm  0 { print(‘0’) } | 1 { print(‘1’) } |…|9 { print(‘9’) }

Conclusion: Parsing and Translation Scheme void expr() { term(); rest(); } void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if ( lookahead == ‘-’ ) { match(‘-’); term(); print(‘-’); rest(); } else { } //do nothing with the input } void term() { if ( lookahead is a digit ) { t = lookahead; match(lookahead); print(t); } else report(“syntax error”);} • Step 2 • Procedures for Nonterminals.

Conclusion: Parsing and Translation Scheme • Step 3 • Simplifying the Translator void rest() { while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); continue; } break; } } void rest() { if ( lookahead == ‘+’ ) { match(‘+’); term(); print(‘+’); rest(); } else if (lookahead == ‘-’) { match(‘-’); term(); print(‘-’); rest(); } else { }

Conclusion: Parsing and Translation Scheme import java.io.*; class Parser{ static int lookahead; public Parser() throws IOException{ lookahead = System.in.read(); } void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System.out.write(‘+’); continue; } elseif (lookahead == ‘-’) { match(‘-’); term(); System.out.write(‘-’); continue; } elsereturn; } • Complete void term() throws IOException { if (Character.isDigit((char)lookahead){ System.out.write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); } void match(int t) throws IOException { if ( lookahead == t ) lookahead = System.in.read();else throw new Error(“syntax error”); } }

Chapter 2

Chapter 2

Presentation Transcript

Chapter 2-2

Chapter 2-2

Chapter 2 - 2

Chapter 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

CHAPTER 2