1 / 75

Chap. 6, Bottom-Up Parsing

Chap. 6, Bottom-Up Parsing. J. H. Wang May 17, 2011. Outline. Overview Shift-Reduce Parsers LR(0) Table Construction Conflict Diagnosis Conflict Resolution and Table Construction. Overview. Problems in top-town parsers Left-recursion Common prefixes (Fig. 5.12 vs. Fig. 5.16)

kolton
Download Presentation

Chap. 6, Bottom-Up Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011

  2. Outline • Overview • Shift-Reduce Parsers • LR(0) Table Construction • Conflict Diagnosis • Conflict Resolution and Table Construction

  3. Overview • Problems in top-town parsers • Left-recursion • Common prefixes • (Fig. 5.12 vs. Fig. 5.16) • Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically

  4. ACTOR

  5. A bottom-up parser begins with parse tree’s leaves, and moves toward its root • A bottom-up parser traces a rightmost derivation in reverse • A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS • (Fig. 4.5 & Fig. 4.6)

  6. Bottom-up: from terminal symbols to the goal symbol • Shift-reduce: two most prevalent actions • Shift symbols onto the parse stack • Reduce a string to nonterminals • LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead • LR parsers are more general than LL parsers • Yacc: LR parser generator

  7. Shift-Reduce Parsers • LR parsers and rightmost derivations • LR parses construct rightmost derivations in reverse • Fig. 6.2 • LR parsing as knitting • How the RHS of a production is found • Fig. 6.1

  8. In Fig. 6.1: • Right needle: unprocessed portion of the string • Left needle: parser’s stack (processed portion) • Operations • Shift: transfers a symbol from right needle to left needle • Reduction: symbols at the top of the parse stack (left needle) • A • (Fig. 6.1)

  9. LR Parsing Engine • A simple driver for shift-reduce parser • Fig. 6.3 • Driven by a table (Sec. 6.2.4) • Indexed by the parser’s current state and the next input symbol • Current state: parser stack • Shift and reduce actions are performed until • Accepted: input is reduced to the goal symbol • Error: no valid actions found

  10. PUSH PEEK PUSH ADVANCE POP PREPEND ERROR

  11. LR Parse Table • Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction • How to identify the handle • Which production to employ • (Fig. 6.4 & Fig. 6.5)

  12. In Fig. 6.5: • [s]: Shift to state s • r: reduction by rule r • Blank: error actions • A bottom-up parse of “a b b d c $” • Fig. 6.6 & Fig. 6.7 • A rightmost derivation in reverse • Shift actions are implied by inability to perform a useful reduction • Tokens are shifted until a handle appears

  13. LR(k) Parsing • Concept of LR parsing introduced by Knuth in 1965 • LR(k) • LR(0): number of symbols lookahead used in constructing the parse table • LR(0) and LR(1): one symbol lookahead at parse time • Number of columns in parse table: nk

  14. Properties of LR(k) parsers • Shifting symbols and examining lookahead until the end of handle is found • Handle is reduced to a nonterminal • Determine whether to shift or reduce, based on the symbols already shifted (left context) and the next k lookahead symbols (right context) • A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language • Deterministic: each cell in LR parse table contains only one entry

  15. Formal Definition of LR(k) Grammars • A grammar is LR(k) iff. the following conditions imply Ay=Bx • S=>*rm Aw =>rm w • S=>*rm Bx =>rm y • Firstk(w)=Firstk(y) • LR(k) parsers can always determine the correct reduction (A) given • The left context () up to the end of the handle • The next k symbols (Firstk(w)) of the input

  16. LR(0) Table Construction • (Fig. 6.2) • E  plus E E • LR(0) item • A grammar production with a bookmark that indicates the current progress through the production’s RHS • Fresh: E  . plus E E • Reducible: E  plus E E . • (Fig. 6.8)

  17. Parser state: a set of LR(0) items • LR(0) construction algorithm • Fig. 6.9 & Fig. 6.10 • ComputeGoto • Closure of state s • Transitions from s • E.g.: Fig. 6.11 • Kernel of state s • A DFA called CFSM (characteristic finite-state machine)

  18. OMPUTE RODUCTIONS OR DD TATE XTRACT LEMENT OMPUTE OTO DD TATE DVANCE OT

  19. LOSURE RODUCTIONS OR OMPUTE OTO LOSURE DVANCE OT DD TATE

  20. CFSM recognizes its grammar’s viable prefixes • Viable prefix: any prefix that does not extend beyond its handle • Accept state in CFSM: a viable prefix that ends with a handle • Reduction • (Fig. 6.12)

  21. For LR(0) grammar, the following properties • Given a syntactically correct input string, CFSM will block only in double-boxed states • There’s at most one item in any double-boxed state • If the input string is syntactically invalid, parser will enter a state that the offending symbol cannot be shifted • To complete that parse table • (Fig. 6.13 & 6.14) • E.g.: (Fig. 6.15)

  22. OMPLETE ABLE OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY SSERT NTRY EPORT ONFLICT

  23. OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY RY ULE N TATE

  24. Conflict Diagnosis • A parse table conflict arises when the table-construction method cannot decide between multiple alternatives for some table entry • Shift/reduce conflicts • Reduce/reduce conflicts • Reasons for conflicts • Grammar is ambiguous • Grammar is no ambiguous, but current table-building approach cannot resolve the conflict • Given more lookahead • Use a more powerful method

  25. Ambiguous Grammars

  26. Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts • Determine a sequence of vocabulary symbols that cause the parse to move from the start state to the inadequate state • E plus E • We obtain a snapshot • E plus E . plus E • (Fig. 6.17)

  27. Top parse tree • Reduction • Left-associative grouping for addition • Bottom parse tree • Shift • Right-associative grouping for addition • -> we eliminate the ambiguity by creating a grammar that favors left-association • (Fig. 6.18)

  28. Grammars that are not LR(k)

  29. Reduce/reduce conflict • Start=>rm Exprs $ =>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $

  30. Conflict Resolution and Table Construction • Increasingly sophisticated lookahead techniques to resolve conflicts • SLR(k): simple • LALR(k) • LR(k): the most powerful

  31. SLR(k) Table Construction • SLR(k): Simple LR with k tokens of lookahead • A grammar that is not LR(0): Fig. 6.20 • Input string: num plus num times num $

  32. Replacing a terminal by a nonterminal whose role in the grammar in equivalent • (Fig. 6.21) • LR(0) construction: (Fig. 6.22) • Shift/reduce conflict of state 6 • Shift: (can continue as in Fig.6.21) • Reduce: block in state 3 • E time num $ is not a valid sentential form • E -> E plus T is appropriate under some conditions

  33. For sentential forms • E plus T $ • E plus T plus num $ • If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form • plus  Follow(E) • TryRuleInState(): (Fig.6.23) • SLR(1) parse table: (Fig. 6.24)

More Related