190 likes | 353 Views
Recursive Descent Parsing (with combinators ). Greg Morrisett. Last Time. We saw how to use combinators to build not just a lexer , but a parser. The only difference is that parsers are generally recursive . And that recursion can get us into trouble. For Example.
E N D
Recursive Descent Parsing(with combinators) Greg Morrisett
Last Time • We saw how to use combinators to build not just a lexer, but a parser. • The only difference is that parsers are generally recursive. • And that recursion can get us into trouble.
For Example Suppose we have a grammar that looks like this: intlist -> INT intlist| <eps>
Using our Combinators intlist -> INT intlist| <eps> let int_p (ts:token list) = match tswith| (INT i)::rest -> [(i,rest)] | _ -> [] let rec intlist_p = funts -> ((int_p $ intlist_p) % cons ++ eps)ts
A Manual Parser intlist -> INT intlist| <eps> letrecintlist_pts =matchtswith | (INT i)::rest -> let (ints,ts’) = intlist_p rest in (i::ints, ts’) | _ -> ([], ts)
For Example But what if we instead wrote: intlist -> intlist INT | <eps> Now the grammar is left-recursive since in one case, we run into the non-terminal intlist before we see any terminal.
Using our Combinators intlist -> intlist INT | <eps> let int_p (ts:token list) = match tswith| (INT i)::rest -> [(i,rest)] | _ -> [] let rec intlist_p = funts -> ((intlist_p $ int_p) % cons_end ++ eps)ts
A Manual Parser intlist -> intlist INT | <eps> letrecintlist_pts =let (ints, ts’) = intlist_ptsin matchts’with | (INT i)::rest -> (ints @ [i], rest) | _ -> ([], ts) Oops! That’s definitely going to loop forever. So we want to avoid writing grammars that are left recursive.
Another Example exp -> INT | exp ‘+’ exp letrecexp_pts = (int_p ++ (exp_p $ tokPLUS $ exp_p) % (function ((i,_),j) -> i+j)))ts
Inlining “++” letrecexp_pts = (int_pts) @ ((exp_p $ tok PLUS $ exp_p) % (function ((i,_),j) -> i+j) ts)
Inlining “$” and “%” letrecexp_pts = (int_pts) @ let s1 = exp_ptsinfold_right (function (i,ts1) a -> ...)
Note – infinite loop! letrecexp_pts = (int_pts) @ let s1 = exp_ptsinfold_right (function (i,ts1) a -> ...)
Refactoring the Grammar exp -> INT | exp ‘+’ exp exp -> INT | INT ‘+’ exp This accepts the same strings, but is no longer left-recursive.
With our Combinators exp -> INT | INT ‘+’ exp let rec exp_pts = int_p ++ (int_p $ tok PLUS $ exp_p) % (function ((i,_),j) -> i+j)
Unwinding the definitions let rec exp_pts = (int_pts) ++ let s1 = int_ptsinfold_right (function (i,ts2) ->match ts2 with | PLUS::ts3 -> let s2 = exp_p ts2 in ... By the time we do the recursive call, the list of tokens is smaller.
Let’s Scale Up exp -> INT | exp ‘+’ exp | exp ‘*’ exp In addition to the problem with left-recursion, we have the problem that we’ll get multiple parse results for an expression like “3 + 2 * 6”.
Getting Rid of Left Recursion exp -> INT | INT ‘+’ exp | INT ‘*’ exp let rec exp_p = int_p ++ (int_p $ tok PLUS $ exp_p) % ... (int_p $ tok TIMES $ exp_p) % ...
Grouping exp -> term | term ‘+’ expterm -> INT | INT * exp
Grouping exp -> term | term ‘+’ expterm -> INT | INT ‘*’ term letrec term ts = (INT ++ (INT $ tok TIMES $ term) % ...) tsandexpts = (term ++ (term $ tok PLUS $ exp) % ...) ts