330 likes | 501 Views
Parsing. Programming Language Principles Lecture 3. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Context-Free Grammars.
E N D
Parsing Programming Language Principles Lecture 3 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida
Context-Free Grammars • Definition: A context-free grammar (CFG) is a quadrupleG = (, , P, S),where all productions are of the formA →, for A and (u )*. • Re-writing using grammar rules: • βAγ => βγif A → (derivation).
String Derivations • Left-most derivation: At each step, the left-most nonterminal is re-written. • Right-most derivation: At each step, the right-most nonterminal is re-written.
Derivation Trees Derivation trees: Describe re-writes, independently of the order (left-most or right-most). • Each tree branch matches a production rule in the grammar.
Derivation Trees Notes: • Leaves are terminals. • Bottom contour is the sentence. • Left recursion causes left branching. • Right recursion causes right branching.
Goal of Parsing • Examine input string, determine whether it's legal. • Equivalent to building derivation tree. • Added benefit: tree embodies syntactic structure of input. • Therefore, tree should be unique.
Ambiguous Grammars • Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z. • (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.
Ambiguous Grammars Classic ambiguities: • Simultaneous left/right recursion: E → E + E → i • Dangling else problem: S → if E then S → if E then S else S →
Operator Precedence and Associativity • Let’s build a CFG for expressions consisting of: • elementary identifier i. • +and - (binary ops) have lowest precedence, and are left associative . • * and / (binary ops) have middle precedence, and are right associative. • + and - (unary ops) have highest precedence, and are right associative.
Corresponding Grammar for Expressions E → E + TE consists of T's, → E - Tseparated by –’s and +'s → T(lowest precedence). T → F * TT consists of F's, → F / Tseparated by *'s and /'s → F(next precedence). F → - FF consists of a single P, → + Fpreceded by +'s and -'s. → P(next precedence). P → '(' E ')'P consists of a parenthesized E, → i or a single i(highest precedence).
Operator Precedence and Associativity • Operator precedence: • The lower in the grammar, the higher the precedence. • Operator Associativity: • Tie breaker for precedence. • Left recursion in the grammar means • left associativity of the operator, • left branching in the tree. • Right recursion in the grammar means • right associativity of the operator, • right branching in the tree.
Building Derivation Trees Sample Input : - + i - i * ( i + i ) / i + i (Human) derivation tree construction: • Bottom-up. • On each pass, scan entire expression, process operators with highest precedence (parentheses are highest). • Lowest precedence operators are last, at the top of tree.
Abstract Syntax Trees • AST is a condensed version of the derivation tree. • No noise (intermediate nodes). • String-to-tree transduction grammar: • rules of the form A → ω => 's'. • Build 's' tree node, with one child per tree from each nonterminal in ω.
Example E → E + T => + → E - T => - → T T → F * T => * → F / T => / → F F → - F => neg → + F => + → P P → '(' E ')' → i => i
String-to-Tree Transduction • We transduce from vocabulary of input symbols, to vocabulary of tree node names. • Could eliminate construction of unary + node, anticipating semantics. F → - F => neg → + F // no more unary +node → P
The Game of Syntactic Dominoes • The grammar: E → E+T T → P*T P → (E) → T → P →i • The playing pieces: An arbitrary supply of each piece (one per grammar rule). • The game board: • Start domino at the top. • Bottom dominoes are the "input."
The Game of Syntactic Dominoes • Game rules: • Add game pieces to the board. • Match the flat parts and the symbols. • Lines are infinitely elastic. • Object of the game: • Connect start domino with the input dominoes. • Leave no unmatched flat parts.
Parsing Strategies • Same as for the game of syntactic dominoes. • “Top-down” parsing: start at the start symbol, work toward the input string. • “Bottom-up” parsing: start at the input string, work towards the goal symbol. • In either strategy, can process the input left-to-right or right-to-left
Top-Down Parsing • Attempt a left-most derivation, by predicting the re-write that will match the remaining input. • Use a string (a stack, really) from which the input can be derived.
Top-Down Parsing Start with S on the stack. At every step, two alternatives: • (the stack) begins with a terminal t. Match t against the first input symbol. • begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input. The OPF does the “predicting” in such a predictive parser.
Classical Top-Down Parsing Algorithm Push (Stack, S); while not Empty (Stack) do if Top(Stack) then if Top(Stack) = Head(input) then input := tail(input) Pop(Stack) else error (Stack, input) else P:= OPF (Stack, input) Push (Pop(Stack), RHS(P)) od
Top-Down Parsing • Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1). • We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input. • Storage requirements: O(n2), where n is the size of the grammar vocabulary (a few hundred).
LL(1) Grammars Definition: A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead) iff for all A, and for allA→, A→, , Select (A → ) ∩ Select (A → ) = • Previous example: Grammar is not LL(1). • More later on why, and what do to about it.
Example: S → A {b,} A → bAd {b} → {d, } Disjoint! Grammar is LL(1)! (At most) one production per entry.
Parsing Programming Language Principles Lecture 3 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida