Compilers

Compilers 4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)

Introduction • Context-free Grammar • The syntax of programming language constructs can be described by context-free grammar • Important aspects • A grammar serves to impose a structure on the linear sequence of tokens which is the program. • Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. • Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.

The role of the parser

Context-Free Grammars • A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: • A finite terminal alphabet Σ • A finite non-terminal alphabet N • A start symbol S N • A finite set of productions P

A Simple Expression Grammar

Leftmost Derivations • A sentential form produced via a leftmost derivation is called a left sentential form. • The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. • Example: f(V+V) E lm Prefix(E)  lm f(E)  lm f(V Tail)  lm f(V+E)  lm f(V+V Tail)  lm f(V+V)

Rightmost Derivations • As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. • Called rightmost or canonical parse • Example: f(V+V) E rm Prefix(E)  rm Prefix(V Tail)  rm Prefix(V+E)  rm Prefix(V+V Tail)  rm Prefix(V+V)  rm f(V+V)

Parse Tree • It is rooted by the start symbol S • Each node is either a grammar symbol or 

Properties of CFGs • The grammar may include useless symbols • The grammar may allow multiple, distinct derivations (parse trees) for some input string. • The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language.

Ambiguity (1) • Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). • Example: 1. Expr →Expr – Expr 2. | id • This grammar allows two different parse tree for id - id - id.

Ambiguity (2)

Parsers and Recognizers • Two approaches • A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. • The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root.

Two approaches of Parser • Deterministic left-to-right top-down • LL method • Deterministic left-to-right bottom-up • LR method • Left-to-right • The sequence of tokens is processed from left to right • Deterministic • No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree

Parsers (Top-down)

Parsers (bottom-Up)

Pre-order and post-order (1) • The top-down method constructs the syntax tree in pre-order • The bottom-up method constructs the syntax tree in post-order

Pre-order and post-order (2)

Principles of top-down parsing • The main task of a top-down parser is to choose the correct alternatives for known non-terminals

Principles of bottom-up parsing • The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.

Creating a top-down parser manually • Recursive descent parsing • Simplest way but has its limitations

Recursive descent parsing program (1)

Recursive descent parsing program (2)

Drawbacks • Three drawbacks • There is still some searching through the alternatives • The method often fails to produce a correct parser • Error handling leaves much to be desired

Second problems (1) • Example 1 • Index_element will never be tried • IDENTIFIER ‘[‘

Second problems (2) • Example 2 • The recognizer will not recognize ab

Second problems (3) • Example 3 • Recursive descent parsers cannot handle left-recursive grammars

Creating a top-down parser automatically • The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. • Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.

LL(1) parsing • FIRST set • The sets of first tokens produced by all alternatives in the grammar. • We have to precompute the FIRST sets of all non-terminals • The first sets of the terminals are obvious. • Finding FIRST() is trivial when  starts with a terminal. • FIRST(N) is the union of the FIRST sets of its alternatives. • First()={a Σ|  * a}

Predictive recursive descent parser • The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.

Closure algorithm for computing the FIRST set (1) • Data definitions

Closure algorithm for computing the FIRST set (2) • Initializations

Closure algorithm for computing the FIRST set (3) • Inference rules

FIRST sets example(1) • Grammar

FIRST sets example(2) • The initial FIRST sets

FIRST sets example(3) • The final FIRST sets

Another Example of First Set

Another Example of First Set (II)

Algorithms of Computing First(α)

The predictive parser (1)

The predictive parser (2)

Practice • Find the FIRST sets of all alternative of the following grammar. • E -> TE’ • E’->+TE’| • T->FT’ • T’->*FT’| • F->(E)|id

Nullable alternatives • A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?

FOLLOW sets • Follow sets • Determining the set of tokens that can immediately follow a given non-terminal N. • LL(1) parser • ‘LL’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. • ‘(1)’ because all choices are based on a one token look-ahead. • Follow(A)={b Σ |S+  Ab β}

Closure algorithm for computing the FOLLOW sets

The first and follow sets

Another Example of Follow Set

Another Example of Follow Set (II)

Algorithm of Follow(A)

Recall the predictive parser rest_expression  ‘+’ expression | FIRST(rest_expr) = {‘+’, } void rest_expression(void) {switch (Token.class) {case '+': token('+'); expression(); break;case EOF: case ')': break;default: error(); } } FOLLOW(rest_expr) = {EOF, ‘)’}

LL(1) conflicts • Example • The codes

Compilers

Compilers

Presentation Transcript

Advanced Compilers

Honors Compilers

Compilers

Compilers

Compilers:

Honors Compilers

Optimizing Compilers

Compilers

COMPILERS

Compilers

COMPILERS

Compilers

Advanced Compilers

Compilers

Compilers

Compilers

Compilers