930 likes | 942 Views
Compilers. 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang. References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000.
E N D
Compilers 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)
Introduction • Context-free Grammar • The syntax of programming language constructs can be described by context-free grammar • Important aspects • A grammar serves to impose a structure on the linear sequence of tokens which is the program. • Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. • Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.
Context-Free Grammars • A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: • A finite terminal alphabet Σ • A finite non-terminal alphabet N • A start symbol S N • A finite set of productions P
Leftmost Derivations • A sentential form produced via a leftmost derivation is called a left sentential form. • The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. • Example: f(V+V) E lm Prefix(E) lm f(E) lm f(V Tail) lm f(V+E) lm f(V+V Tail) lm f(V+V)
Rightmost Derivations • As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. • Called rightmost or canonical parse • Example: f(V+V) E rm Prefix(E) rm Prefix(V Tail) rm Prefix(V+E) rm Prefix(V+V Tail) rm Prefix(V+V) rm f(V+V)
Parse Tree • It is rooted by the start symbol S • Each node is either a grammar symbol or
Properties of CFGs • The grammar may include useless symbols • The grammar may allow multiple, distinct derivations (parse trees) for some input string. • The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language.
Ambiguity (1) • Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). • Example: 1. Expr →Expr – Expr 2. | id • This grammar allows two different parse tree for id - id - id.
Parsers and Recognizers • Two approaches • A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. • The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root.
Two approaches of Parser • Deterministic left-to-right top-down • LL method • Deterministic left-to-right bottom-up • LR method • Left-to-right • The sequence of tokens is processed from left to right • Deterministic • No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree
Pre-order and post-order (1) • The top-down method constructs the syntax tree in pre-order • The bottom-up method constructs the syntax tree in post-order
Principles of top-down parsing • The main task of a top-down parser is to choose the correct alternatives for known non-terminals
Principles of bottom-up parsing • The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.
Creating a top-down parser manually • Recursive descent parsing • Simplest way but has its limitations
Drawbacks • Three drawbacks • There is still some searching through the alternatives • The method often fails to produce a correct parser • Error handling leaves much to be desired
Second problems (1) • Example 1 • Index_element will never be tried • IDENTIFIER ‘[‘
Second problems (2) • Example 2 • The recognizer will not recognize ab
Second problems (3) • Example 3 • Recursive descent parsers cannot handle left-recursive grammars
Creating a top-down parser automatically • The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. • Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.
LL(1) parsing • FIRST set • The sets of first tokens produced by all alternatives in the grammar. • We have to precompute the FIRST sets of all non-terminals • The first sets of the terminals are obvious. • Finding FIRST() is trivial when starts with a terminal. • FIRST(N) is the union of the FIRST sets of its alternatives. • First()={a Σ| * a}
Predictive recursive descent parser • The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.
Closure algorithm for computing the FIRST set (1) • Data definitions
Closure algorithm for computing the FIRST set (2) • Initializations
Closure algorithm for computing the FIRST set (3) • Inference rules
FIRST sets example(1) • Grammar
FIRST sets example(2) • The initial FIRST sets
FIRST sets example(3) • The final FIRST sets
Practice • Find the FIRST sets of all alternative of the following grammar. • E -> TE’ • E’->+TE’| • T->FT’ • T’->*FT’| • F->(E)|id
Nullable alternatives • A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?
FOLLOW sets • Follow sets • Determining the set of tokens that can immediately follow a given non-terminal N. • LL(1) parser • ‘LL’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. • ‘(1)’ because all choices are based on a one token look-ahead. • Follow(A)={b Σ |S+ Ab β}
Recall the predictive parser rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, } void rest_expression(void) {switch (Token.class) {case '+': token('+'); expression(); break;case EOF: case ')': break;default: error(); } } FOLLOW(rest_expr) = {EOF, ‘)’}
LL(1) conflicts • Example • The codes