140 likes | 222 Views
Top-Down Parsing. CPSC 388 Ellen Walker Hiram College. From Last Week. Push Down Automata recognize CFL’s State machine augmented with stack Transition is c,P;S where c is character read, P is symbol popped, and S is symbol pushed. More From Last Week.
E N D
Top-Down Parsing CPSC 388 Ellen Walker Hiram College
From Last Week • Push Down Automata recognize CFL’s • State machine augmented with stack • Transition is c,P;S where c is character read, P is symbol popped, and S is symbol pushed.
More From Last Week • Every CFL has a PDA that accepts it: • Non-terminals and terminals are pushed on stack • For every rule, pop the LHS, push the RHS • E.g. e,S;xSy (assume top is at right) • When an input character matches the top of the stack, eat the character and pop • E.g. a,a;e
Every PDA has a CFL... • Idea: • A goal of a PDA is <current state, stack-top, future state> • Overall goal is <Init,e,Final> • For each transition p -- x,y;z --> q make a rule <p,y,r> -> x<q,z,r> • Also make rules <p,e,p> -> e • Result: Correct grammar , many rules!
Why do we care? • We proved that every PDA has a corresponding CFL (by algorithm) • We proved that every CFL has a corresponding PDA (by earlier algorithm) • Therefore, PDA’s and CFL’s are equivalent in the class of languages they accept (Context Free Languages)
Top Down Parsing • Begin with the start symbol S (top of tree) • Until the string has been read (parsed): • Choose a rule for the first non-terminal • Read characters for terminals and push symbols for non-terminals • If the stack is empty, or no rule can “eat” the current character, parse error!
But how to choose the rule? • Backtracking: • Systematically try everything • When you get stuck, change the most recent choice (“backtrack”) • If all choices are tried, then syntax error • Prediction • Look at current and future characters to pick the right choice
Recursive Descent Parsing • Every non-terminal is a function For each RHS of this symbol Call non-terminals in sequence, reading tokens as appropriate If the whole RHS worked, return
Example: Rule to Function • exp -> factor + factor | factor Int exp(){ int res = factor(); if (!res) return 0; if (nexttoken()== ‘+’) res=factor(); return true; }
Another Example • S-> aSa | bSb | e int S(){ if (nexttoken()==‘a’ && S() && nexttoken()==‘a’) return true if (nexttoken()==‘b’ && S() && nexttoken()==‘b’) return true; return true; }
What we’re not showing • If a parse fails, many input characters may have been read on the way to the failed parse. These must be “unread” before the next try. • This involves a bit of bookkeeping, but value parameters help • Recursive Descent is best for simple languages without too many choices
Another Problem • Consider: • S -> Sab | e int S(){ if ( S() && nexttoken()==‘a’ && nexttoken()==‘b’) return true return true; }
Adding Prediction • No lookahead: case statement based on first character (e.g. “exp” in book) • Adding lookahead: consider two sets • All possible first characters of S • All possible characters that can follow S
Using First & Follow Sets • Use an explicit stack as in the CFL to PDA algorithm. • If the current character is in First (S), then it’s OK to pop S and push its rule beginning with this character • If the current character is in Follow (S) then it’s OK to pop S without pushing (i.e. use S-> e rule)