690 likes | 706 Views
Learn about context-free grammars, hierarchical structures, and productions in programming languages. Explore examples illustrating grammar rules, parse trees, and ambiguity. Understand operator associativity, precedence, and syntax-directed translation.
E N D
Chapter 2 A Simple One-Pass Compiler Yu-Chen Kuo
2.1 Overview • Programming language: • What its program look like (Syntax : context-free grammars) • What its program mean (Semantics : more difficult) Yu-Chen Kuo
2.2 Syntax Definition • Context-free grammar • Grammar : hierarchical structure • stmtif (expr) stmtelsestmt • production • token: if, (, else • nonterminal: expr, stmt Yu-Chen Kuo
Context-free Grammar • A set of tokens (terminals) • Digits • Sign (+, -, <, =) • if, while • A set of nonterminals • A set of productions • nonterminal ternimal/nonterminal • left side right side • First nonterminal symbol: start symbol Yu-Chen Kuo
Example 2.1: Grammars of expression ‘9-5+2’ • Example 2.1: grammars of expression ‘9-5+2’ list list + digit list list – digit list digit digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 • list list+digit | list-digit | digit • nonterminal: list (start symbol), digit • terminal (token): 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo
Example 2.1: Grammars of expression ‘9-5+2’ • Token strings are derived from the start symbol and repeatedly replacing a nonterminal by the right side of a production • Empty string: • All possible token strings form the language defined by the grammar Yu-Chen Kuo
Example 2.2: Parse Tree • Show how the start symbol derives a string list list + digit list list – digit list digit digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9 Yu-Chen Kuo
Parse Trees A X Y Z • Root is labeled by start symbol • Each leaf is labeled by a token or • Each interior is labeled by a nonterminal • If A is the nonterminal node and X1, X2,..Xn are the labels of children of that node from left to right, then A X1, X2,..Xn , is a production Yu-Chen Kuo
Example 2.3: Pascal begin-end blocks block begin opt_stmts end opt_stmts stmt_list | stmt_list stmt_list ; stmt | stmt stmtif (expr) stmtelsestmt | assignment stmt Yu-Chen Kuo
Ambiguity of A Grammar • A grammar is said to be ambiguous if it can have more than one parser tree generating a given string. Yu-Chen Kuo
Ambiguity of A Grammar string string+string | string-string string 0|1|2|3|4|5|6|7|8|9 • Two expressions (9-5)+2 and 9-(5+2) Yu-Chen Kuo
Associativity of Operators • Left Associative: 9+5-2 (9+5)-2 • +, -, *, / • Parse tree grows down towards the left • Right Associative: a=b=c a=(b=c) • Parse tree grows down towards the right Yu-Chen Kuo
Associativity of Operators right letter = right | letter letter a|b|c|…|z Yu-Chen Kuo
Precedence of Operators • 9+5*2 9+(5*2) • * , / has higher precedence than +, - • *, /, +, - are all left associative • term for *, / • term term * factor | term / factor | factor • expr for +,- • expr expr + factor | expr – factor | factor • factor digit |(expr) Yu-Chen Kuo
Precedence of Operators • Syntax of expression expr expr + term | expr – term | term term term * factor | term / factor | factor factor digit |(expr) • Syntax of statement for Pascal (ambiguous?) stmt id := expr | if expr then stmt | if expr then stmt else stmt | while expr do stmt | begin opt_stmts end Yu-Chen Kuo
2.3 Syntax-Directed Translation • The syntax-directed definition and translation schema are two formalisms for specifying translations for programming language • A syntax-directed definition uses a context-grammar to specify the syntactic structure • With each grammar symbol X, it associates a set of attributes, and with each production, a set of semantic rules for computing value of the attributes X.a of the symbols • The grammar and the set of semantic rules constitute the syntax-directed definition Yu-Chen Kuo
2.3 Syntax-Directed Translation • A syntax-directed definition for translating expressions consisting of digits separated by plus or minus into postfix notation Yu-Chen Kuo
Postfix Notation • If E is a variable, then postfix(E)=E • If E is an expression of form E1opE2, then the postfix(E)= E1 E2 op, where E1 = postfix(E1)= and E2 = postfix(E2) • If E is an expression of the form (E1), then postfix(E)= postfix (E1) • postfix(9-5+2)=95-2+ Yu-Chen Kuo
Postfix Notation Yu-Chen Kuo
Robot’s position Yu-Chen Kuo
Robot’s position Yu-Chen Kuo
Robot’s position Yu-Chen Kuo
Depth-First Traversals Yu-Chen Kuo
Translation Schemes • A translation scheme is a context-free grammar in which semantic actions are embedded within the right sides of productions • A translation scheme is like a syntax-directed definition, except the order of evaluation of the semantic rules is explicitly shown Yu-Chen Kuo
Translation Schemes Yu-Chen Kuo
2.4 Parsing • Parsing is the process of determining if a string of tokens can be generated by a grammar. • For any context-free grammar, a parser will takes at most O(n3) time to parse a string of n tokens, too expensive. • Given a programming language, we can generally construct a grammar that can be parsed in linear time ( make a single left-to-right scan, looking ahead one token at a time) Yu-Chen Kuo
2.4 Parsing • Top-down parser: parser tree construction starts at the root and proceeds towards the leaves • Bottom-up parser : parser tree construction starts at the leaves and proceeds towards the root. (most class of grammars) Yu-Chen Kuo
Top-Down Parsing • The construction of parser tree is done by started with the root, labeled with the starting nonterminal, and repeatedly performing the following two steps. • At node n, labeled with A, select one of production for A and construct children at n for the symbols on the right side of production. • Find the next node at which a subtree is to be constructed. Yu-Chen Kuo
Example type simple |id|array[simple]of type simple integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo
Example (Cont.) type simple |id|array[simple]of type simple integer |char|num dotdot num e.x.; array[num dotdot num]ofinteger Yu-Chen Kuo
Example (Cont.) type simple |id|array[simple]of type simple integer |char|num dotdot num Yu-Chen Kuo
Example (Cont.) type simple |id|array[simple]of type simple integer |char|num dotdot num Yu-Chen Kuo
Predictive Parsing • Recursive-descent parsing is a top-down parsing Yu-Chen Kuo
Predictive Parsing (Cont.) type simple |id|array[simple]of type simple integer |char|num dotdot num Yu-Chen Kuo
Predictive Parsing (Cont.) • Use lookahead symbol and first symbol (FIRST)of a production to unambiguously determine the procedure selected for each nonterminal. • FIRST (): the set of tokens that appear as the first symbols of one or more strings generated from • FIRST (simple) = { integer, char, num} • FIRST (id ) = {} • FIRST (array [ simple] oftype) = {array} • A | , then FIRST () FIRST () in predictive parsing Yu-Chen Kuo
When to Use -Production stmt begin opt_stmts end opt_stmts stmt_list | • While parsing opt_stmts, if lookahead symbol is not in FIRST(stmt_list), then –production is used, lookahead symbol is end; otherwise, error Yu-Chen Kuo
Designing a Predictive Parser • Consisting of a procedure for every nonterminal • Each procedure does two things. • Decide which production to use by looking at the lookahead symbol. The production with right side is used if the lookahead symbol is in FIRST(). If the lookahead symbol is not in the FIRST set of any other right hand side, a production with on the right side is used. • The procedure uses a production by mimicking the right side. A nonterminal results in a procedure call for the nonterminal. A token matching the lookahead symbol results in reading the next input token. Yu-Chen Kuo
Eliminating Left Recursion • expr expr + term | term • Loop forever expr( ) • A A | A R R R | • expr expr + term | term • expr term rest rest + term rest | Yu-Chen Kuo
Eliminating Left Recursion (Cont.) Yu-Chen Kuo
A Translator for Simple Expressions Yu-Chen Kuo
Adapting the Translation Scheme • Eliminate left recursion • A A | A | A R R R | R | • expr expr + term {print(‘+’)} • expr term rest rest + term {print(‘+’)} rest |- term {print(‘-’)} rest | term 0{print(‘0’)} term 9{print(‘9’)} Yu-Chen Kuo
Adapting the Translation Scheme (Cont.) Yu-Chen Kuo
Procedures for the Nonterminals expr, term, and rest Yu-Chen Kuo
Optimizing the Translator • Replacing tail recursion by iteration rest ( ) { L: if (lookahead == ‘+’) { match(‘+’); term ( ); putchar(‘+’); goto L; } else if (lookahead == ‘-’) { match(‘-’); term ( ); putchar(‘-’); goto L; } else; } Yu-Chen Kuo
Optimizing the Translator (Cont.) Yu-Chen Kuo
The Complete Program Yu-Chen Kuo
The Complete Program (Cont.) Yu-Chen Kuo
The Complete Program (Cont.) Yu-Chen Kuo
2.6 Lexical Analysis • Removal of White Space and Comments • Blanks, tabs, newlines • Constants • Adding production to the grammar for expressions • Creating a token num for constants • 31 + 28 + 59 • <num, 31> <+, > < num, 28> <+, > < num, 59> • Recognizing Identifiers and Keywords • Keywords are reserved • begin /* keyword */ count = count + increment; /* id = id + id */ end Yu-Chen Kuo
Interface to the Lexical Analyzer • A lexical analyzer reads characters, group into lexemes , and passes the tokens formed by the lexemes, together with their attribute values to the later stages of the compiler. Yu-Chen Kuo