1 / 33

Parsing

Parsing. Programming Language Principles Lecture 3. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Context-Free Grammars.

merv
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Programming Language Principles Lecture 3 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida

  2. Context-Free Grammars • Definition: A context-free grammar (CFG) is a quadrupleG = (, , P, S),where all productions are of the formA →, for A   and   (u )*. • Re-writing using grammar rules: • βAγ => βγif A → (derivation).

  3. String Derivations • Left-most derivation: At each step, the left-most nonterminal is re-written. • Right-most derivation: At each step, the right-most nonterminal is re-written.

  4. Derivation Trees Derivation trees: Describe re-writes, independently of the order (left-most or right-most). • Each tree branch matches a production rule in the grammar.

  5. Derivation Trees Notes: • Leaves are terminals. • Bottom contour is the sentence. • Left recursion causes left branching. • Right recursion causes right branching.

  6. Goal of Parsing • Examine input string, determine whether it's legal. • Equivalent to building derivation tree. • Added benefit: tree embodies syntactic structure of input. • Therefore, tree should be unique.

  7. Ambiguous Grammars • Definition: A CFG is ambiguous if there exist two different right-most (or left-most, but not both) derivations for some sentence z. • (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

  8. Ambiguous Grammars Classic ambiguities: • Simultaneous left/right recursion: E → E + E → i • Dangling else problem: S → if E then S → if E then S else S →

  9. Operator Precedence and Associativity • Let’s build a CFG for expressions consisting of: • elementary identifier i. • +and - (binary ops) have lowest precedence, and are left associative . • * and / (binary ops) have middle precedence, and are right associative. • + and - (unary ops) have highest precedence, and are right associative.

  10. Corresponding Grammar for Expressions E → E + TE consists of T's, → E - Tseparated by –’s and +'s → T(lowest precedence). T → F * TT consists of F's, → F / Tseparated by *'s and /'s → F(next precedence). F → - FF consists of a single P, → + Fpreceded by +'s and -'s. → P(next precedence). P → '(' E ')'P consists of a parenthesized E, → i or a single i(highest precedence).

  11. Operator Precedence and Associativity • Operator precedence: • The lower in the grammar, the higher the precedence. • Operator Associativity: • Tie breaker for precedence. • Left recursion in the grammar means • left associativity of the operator, • left branching in the tree. • Right recursion in the grammar means • right associativity of the operator, • right branching in the tree.

  12. Building Derivation Trees Sample Input : - + i - i * ( i + i ) / i + i (Human) derivation tree construction: • Bottom-up. • On each pass, scan entire expression, process operators with highest precedence (parentheses are highest). • Lowest precedence operators are last, at the top of tree.

  13. Abstract Syntax Trees • AST is a condensed version of the derivation tree. • No noise (intermediate nodes). • String-to-tree transduction grammar: • rules of the form A → ω => 's'. • Build 's' tree node, with one child per tree from each nonterminal in ω.

  14. Example E → E + T => + → E - T => - → T T → F * T => * → F / T => / → F F → - F => neg → + F => + → P P → '(' E ')' → i => i

  15. Sample Input :- + i - i * ( i + i ) / i + i

  16. String-to-Tree Transduction • We transduce from vocabulary of input symbols, to vocabulary of tree node names. • Could eliminate construction of unary + node, anticipating semantics. F → - F => neg → + F // no more unary +node → P

  17. The Game of Syntactic Dominoes • The grammar: E → E+T T → P*T P → (E) → T → P →i • The playing pieces: An arbitrary supply of each piece (one per grammar rule). • The game board: • Start domino at the top. • Bottom dominoes are the "input."

  18. The Game of Syntactic Dominoes • Game rules: • Add game pieces to the board. • Match the flat parts and the symbols. • Lines are infinitely elastic. • Object of the game: • Connect start domino with the input dominoes. • Leave no unmatched flat parts.

  19. Parsing Strategies • Same as for the game of syntactic dominoes. • “Top-down” parsing: start at the start symbol, work toward the input string. • “Bottom-up” parsing: start at the input string, work towards the goal symbol. • In either strategy, can process the input left-to-right  or right-to-left 

  20. Top-Down Parsing • Attempt a left-most derivation, by predicting the re-write that will match the remaining input. • Use a string (a stack, really) from which the input can be derived.

  21. Top-Down Parsing Start with S on the stack. At every step, two alternatives: •  (the stack) begins with a terminal t. Match t against the first input symbol. •  begins with a nonterminal A. Consult an OPF (Omniscient Parsing Function) to determine which production for A would lead to a match with the first symbol of the input. The OPF does the “predicting” in such a predictive parser.

  22. Classical Top-Down Parsing Algorithm Push (Stack, S); while not Empty (Stack) do if Top(Stack)  then if Top(Stack) = Head(input) then input := tail(input) Pop(Stack) else error (Stack, input) else P:= OPF (Stack, input) Push (Pop(Stack), RHS(P)) od

  23. Top-Down Parsing • Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1). • We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input. • Storage requirements: O(n2), where n is the size of the grammar vocabulary (a few hundred).

  24. LL(1) Grammars Definition: A CFG G is LL(1) (Left-to-right, Left-most, one-symbol lookahead) iff for all A, and for allA→, A→,   , Select (A → ) ∩ Select (A → ) =  • Previous example: Grammar is not LL(1). • More later on why, and what do to about it.

  25. Example: S → A {b,} A → bAd {b} → {d, } Disjoint! Grammar is LL(1)! (At most) one production per entry.

  26. Parsing Programming Language Principles Lecture 3 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida

More Related