160 likes | 305 Views
More LR Parsing and Bison. CPSC 388 Ellen Walker Hiram College. More than SLR(1). SLR(k) Parsing Multiple-token lookahead (for shifts) and multiple-token follow information (for reductons) General LR(1) parsing Include lookaheads in DFA construction LALR(1) parsing
E N D
More LR Parsing and Bison CPSC 388 Ellen Walker Hiram College
More than SLR(1) • SLR(k) Parsing • Multiple-token lookahead (for shifts) and multiple-token follow information (for reductons) • General LR(1) parsing • Include lookaheads in DFA construction • LALR(1) parsing • Simplified state diagram for GLR(1) • What YACC / Bison uses
LALR: LR(0) + Lookahead • NFA states are [ LR(0) item , lookahead] • Examples: [S->.(S), $] , [S->.a,)] • After comma is first token after the RHS • Building DFA • [S->.(S),$] --(--> [S->(.S),$] same as SLR • [S->(.S),$] --e--> [S->.(S),)] propagate LA • Rule for every S-rule, every first of what follows S in original rule
YACC / Bison • “Yet another Compiler-Compiler” • Given CFG, automatically creates LALR table • Using bison: • Input file: grammar.y • Output file: grammar.tab.c • Generic main reads lines, executes rules
Structure of a Bison File Definitions, including direct code in %{ %} %% Rules of the grammar, with actions %% Additional code, e.g. main(){ return yyparse() }
Example: Expression Calculator • Rules describe the usual grammar • S’ -> exp • exp -> exp + term | exp - term | term • term -> term * factor | factor • factor -> NUMBER | ( exp ) • Associated actions execute the arithmetic
Bison Rule Syntax • LHS followed by : • Each alternative followed by action, then | • ; after the last action • Example factor : NUMBER {$$ = $1;} | '(' exp ')' {$$ = $2;} ;
Bison Actions • Rules include actions in { } (code) • Predefined variables: • $$ value of result of rule (YYSTYPE or int) • $1 value of first token, $2 value of second token, etc. • Example • Exp: exp ‘+’ term {$$ = $1 + $3;}
Bison and Flex together • Define tokens in definition section: • %token ID <val> • Choose values > 256 • Make sure lex.yy.c and yy.tab.c agree on token ID defs #define ID val • Compile both together • g++ -o myparser yy.tab.c lex.yy.c -lfl
Flex for Bison • Each rule should return a token type • E.g. return NUMBER; • In addition, a token value can be saved in the global variable yylval • E.g. yylval = myAtoI(yytext);
Mixing Characters and Tokens • Don’t assign token values < 256 • Allow characters to be their own tokens (rule): . return(yytext[0]); • Or be specific: [-+*()] return(yytext[0]);
A Few Gotcha’s • Bison (and flex) like tabs, not spaces • Beware of commenting out closing } with // • C (++) requires functions to be defined before they are used • Copy signatures to top of file • “extern” for functions and variables defined in other files
Bison Individual Homework • Use Bison to parse and interpret simple LISP-like commands • ( cons a (cons b nil)) => (a b) • ( cons (cons a nil) (cons b nil) => ((a) b) • (car (cons a (cons b nil))) => a • (cdr (cons a (cons b nil))) => (b) • (cdr (cdr (cons a (cons b nil)))) => nil • See handout for details
Error Handling in BU parsing • Error = blank entry in parsing table • To give specific error messages • Many error entries (but bigger table!) • Detect error before reducing when possible • LR(1) is better than SLR(1) here
Recovery • Panic mode: • Pop states from the stack until the parse can be restarted • Advance input until a legal transition is available • Error productions • Treat “error” as a pseudotoken • Rules indicate how much to throw away
Error Example • command : exp {cout << $1 << endl;} | error {yyerror “bad cmd”;} • Once a command is in error, parser will • Perform the error action • Delete tokens until a legal follow of command ($ here)