190 likes | 339 Views
More Parsing. CPSC 388 Ellen Walker Hiram College. Review LL(1) Grammars. Compute First and Follow sets Build the parsing table If x is in First(A), then M[A,x] = A->xZ (the rule that put x in First(A) If e is in First(A) and x is in Follow(A), then M[A,x] = A-> e
E N D
More Parsing CPSC 388 Ellen Walker Hiram College
Review LL(1) Grammars • Compute First and Follow sets • Build the parsing table • If x is in First(A), then M[A,x] = A->xZ (the rule that put x in First(A) • If e is in First(A) and x is in Follow(A), then M[A,x] = A-> e • If each cell has no more than 1 rule, grammar is LL(1).
LL(k) Grammars • Look at k terminals instead of 1 terminal • First(S) is all sequences of k terminals that can begin S • Follow(S) is all sequences of k terminals that can follow S • Col. headers of table are sequences of k terminals instead of single terminals • First & Follow computations get messy!
Building Parse Trees • Each item on the stack is a syntax tree node • To “use” a rule: • Pop (and save) LHS from stack. • Create nodes for each RHS element • Connect RHS nodes as children of LHS node • Push RHS nodes (reverse order) on stack
Parse Tree Example • Parsing: “aabb” • Grammar: S->aSb | • After S->aSb: a S1 S2 a S2 b b Stack Tree
Error Recovery • Recognizer - either program is acceptable or not • Error Correction - attempt to replace error by correct program • Minimal distance error correction is too hard • Limited to simple errors (e.g. missing ;)
Error Recovery Principles • Find error as soon as possible (to report its location accurately) • Pick up parsing as soon as possible after error (so multiple errors caught) • Avoid errors generating many spurious additional error messages • Avoid infinite loops on errors (!)
Recursive Descent Error Recovery • Panic Mode • Each function has additional parameter: synchronizing tokens (e.g. ;) • Error causes parser to scan ahead (ignoring tokens) to find next synchronizing token • Typical synchronizing tokens are in follow set.
Example Pseudocode Void factor (list<token> synchset){ token = scanto({(,num}, syncset); switch (token){ (: exp(‘)’); match(‘)’); break; num: match(num); break; default: error(“Factor”); return false; } return true; }
Error Recovery in LL(1) • Fill in each “blank” cell with one of the following options: • Pop: pop A from the stack (if current token is $ or in Follow(A)). “give up on” A • Scan: skip tokens until we find one where we can restart the parse. • Push a new nonterminal (e.g. start symbol if stack becomes empty before input does)
Bottom Up Parsing • Start with tokens • Build up rule RHS (right side) • Replace RHS by LHS • Done when stack is only start symbol • (Working from leaves of tree to root)
Operations in Bottom-up Parsing • Shift: • Push the terminal from the beginning of the string to the top of the stack • Reduce • Replace the string xyz at the top of the stack by a nonterminal A (assuming A->xyz) • Accept (when stack is $S’; empty input)
Lookahead • Look ahead in input by shifting (it’ll all be in the stack) • Look ahead in the stack • This requires breaking the abstraction just a little bit (but is technically no problem) • As before, decision to shift or reduce is made based on next token and stack
Sample Parse • S’ -> S; S-> aSb | bSa | SS | e • String: abba • Stack = $, input = abba$; shift • Stack = $a input = bba$; reduce S->e • Stack = $aS input = bba$ ; shift • Stack = $aSb input = ba$ ; reduce S->aSb • Stack = $S input = ba ; shift
Sample Parse (cont) • Stack = $S input = ba$ ; shift • Stack = $Sb input = a$ ; reduce S->e • Stack = $SbS input = a$ ; shift • Stack = $SbSa input = $; reduce S->bSa • Stack = $SS input = $; reduce S->SS • Stack = $S input = $; reduce S’-> S • Stack = $S’ input = $; accept
Rightmost Derivation • Reduce rules (in order used) • S->e • S->aSb • S->e • S->bSa • S-> SS • S’-> S
Rightmost Derivation • Rules read “upward” give the following derivation: • S’->S ->SS ->SbSa->Sba ->aSba ->abba • Shift reduce parser generates rightmost derivation in reverse order! • LR(k) = left-to-right input, rightmost derivation.
Right Sentential Form • Each intermediate term of a rightmost derivation is called a right sentential form • S’ S SS SbSa • Sba aSba abba • All legal intermediate states are right sentential forms (split btwn stack and input string)
Shift vs. Reduce • Shift until reduction to next left sentential form is possible • When complete RHS is at top of stack • …and more of RHS is not at beginning of string. (Otherwise, S->e would always be used!)