780 likes | 1.19k Views
Chap. 6, Bottom-Up Parsing. J. H. Wang May 17, 2011. Outline. Overview Shift-Reduce Parsers LR(0) Table Construction Conflict Diagnosis Conflict Resolution and Table Construction. Overview. Problems in top-town parsers Left-recursion Common prefixes (Fig. 5.12 vs. Fig. 5.16)
E N D
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011
Outline • Overview • Shift-Reduce Parsers • LR(0) Table Construction • Conflict Diagnosis • Conflict Resolution and Table Construction
Overview • Problems in top-town parsers • Left-recursion • Common prefixes • (Fig. 5.12 vs. Fig. 5.16) • Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically
A bottom-up parser begins with parse tree’s leaves, and moves toward its root • A bottom-up parser traces a rightmost derivation in reverse • A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS • (Fig. 4.5 & Fig. 4.6)
Bottom-up: from terminal symbols to the goal symbol • Shift-reduce: two most prevalent actions • Shift symbols onto the parse stack • Reduce a string to nonterminals • LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead • LR parsers are more general than LL parsers • Yacc: LR parser generator
Shift-Reduce Parsers • LR parsers and rightmost derivations • LR parses construct rightmost derivations in reverse • Fig. 6.2 • LR parsing as knitting • How the RHS of a production is found • Fig. 6.1
In Fig. 6.1: • Right needle: unprocessed portion of the string • Left needle: parser’s stack (processed portion) • Operations • Shift: transfers a symbol from right needle to left needle • Reduction: symbols at the top of the parse stack (left needle) • A • (Fig. 6.1)
LR Parsing Engine • A simple driver for shift-reduce parser • Fig. 6.3 • Driven by a table (Sec. 6.2.4) • Indexed by the parser’s current state and the next input symbol • Current state: parser stack • Shift and reduce actions are performed until • Accepted: input is reduced to the goal symbol • Error: no valid actions found
PUSH PEEK PUSH ADVANCE POP PREPEND ERROR
LR Parse Table • Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction • How to identify the handle • Which production to employ • (Fig. 6.4 & Fig. 6.5)
In Fig. 6.5: • [s]: Shift to state s • r: reduction by rule r • Blank: error actions • A bottom-up parse of “a b b d c $” • Fig. 6.6 & Fig. 6.7 • A rightmost derivation in reverse • Shift actions are implied by inability to perform a useful reduction • Tokens are shifted until a handle appears
LR(k) Parsing • Concept of LR parsing introduced by Knuth in 1965 • LR(k) • LR(0): number of symbols lookahead used in constructing the parse table • LR(0) and LR(1): one symbol lookahead at parse time • Number of columns in parse table: nk
Properties of LR(k) parsers • Shifting symbols and examining lookahead until the end of handle is found • Handle is reduced to a nonterminal • Determine whether to shift or reduce, based on the symbols already shifted (left context) and the next k lookahead symbols (right context) • A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language • Deterministic: each cell in LR parse table contains only one entry
Formal Definition of LR(k) Grammars • A grammar is LR(k) iff. the following conditions imply Ay=Bx • S=>*rm Aw =>rm w • S=>*rm Bx =>rm y • Firstk(w)=Firstk(y) • LR(k) parsers can always determine the correct reduction (A) given • The left context () up to the end of the handle • The next k symbols (Firstk(w)) of the input
LR(0) Table Construction • (Fig. 6.2) • E plus E E • LR(0) item • A grammar production with a bookmark that indicates the current progress through the production’s RHS • Fresh: E . plus E E • Reducible: E plus E E . • (Fig. 6.8)
Parser state: a set of LR(0) items • LR(0) construction algorithm • Fig. 6.9 & Fig. 6.10 • ComputeGoto • Closure of state s • Transitions from s • E.g.: Fig. 6.11 • Kernel of state s • A DFA called CFSM (characteristic finite-state machine)
OMPUTE RODUCTIONS OR DD TATE XTRACT LEMENT OMPUTE OTO DD TATE DVANCE OT
LOSURE RODUCTIONS OR OMPUTE OTO LOSURE DVANCE OT DD TATE
CFSM recognizes its grammar’s viable prefixes • Viable prefix: any prefix that does not extend beyond its handle • Accept state in CFSM: a viable prefix that ends with a handle • Reduction • (Fig. 6.12)
For LR(0) grammar, the following properties • Given a syntactically correct input string, CFSM will block only in double-boxed states • There’s at most one item in any double-boxed state • If the input string is syntactically invalid, parser will enter a state that the offending symbol cannot be shifted • To complete that parse table • (Fig. 6.13 & 6.14) • E.g.: (Fig. 6.15)
OMPLETE ABLE OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY SSERT NTRY EPORT ONFLICT
OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY RY ULE N TATE
Conflict Diagnosis • A parse table conflict arises when the table-construction method cannot decide between multiple alternatives for some table entry • Shift/reduce conflicts • Reduce/reduce conflicts • Reasons for conflicts • Grammar is ambiguous • Grammar is no ambiguous, but current table-building approach cannot resolve the conflict • Given more lookahead • Use a more powerful method
Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts • Determine a sequence of vocabulary symbols that cause the parse to move from the start state to the inadequate state • E plus E • We obtain a snapshot • E plus E . plus E • (Fig. 6.17)
Top parse tree • Reduction • Left-associative grouping for addition • Bottom parse tree • Shift • Right-associative grouping for addition • -> we eliminate the ambiguity by creating a grammar that favors left-association • (Fig. 6.18)
Reduce/reduce conflict • Start=>rm Exprs $ =>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $
Conflict Resolution and Table Construction • Increasingly sophisticated lookahead techniques to resolve conflicts • SLR(k): simple • LALR(k) • LR(k): the most powerful
SLR(k) Table Construction • SLR(k): Simple LR with k tokens of lookahead • A grammar that is not LR(0): Fig. 6.20 • Input string: num plus num times num $
Replacing a terminal by a nonterminal whose role in the grammar in equivalent • (Fig. 6.21) • LR(0) construction: (Fig. 6.22) • Shift/reduce conflict of state 6 • Shift: (can continue as in Fig.6.21) • Reduce: block in state 3 • E time num $ is not a valid sentential form • E -> E plus T is appropriate under some conditions
For sentential forms • E plus T $ • E plus T plus num $ • If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form • plus Follow(E) • TryRuleInState(): (Fig.6.23) • SLR(1) parse table: (Fig. 6.24)