110 likes | 275 Views
Parser construction tools: YACC. Yacc is on Unix systems, it creates LALR parsers in C. yacc specification. yacc. y.tab.c. The yacc specification may ‘#include’ a lexical analyzer produced by Lex, or by other means. ly library. C compiler. more of your C.
E N D
Parser construction tools: YACC • Yacc is on Unix systems, it creates LALR parsers in C yacc specification yacc y.tab.c The yacc specification may ‘#include’ a lexical analyzer produced by Lex, or by other means ly library C compiler more of your C The ly library contains the LALR parser which uses the parsing table built by yacc and calls the lexer ‘yylex’ a.out source output http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
The three parts of a yacc specification • declarations • ordinary C, enclosed in %{ … %}, copied verbatim into y.tab.c • declarations for use by yacc, such as %token, %left, %right, %nonassoc • separator – %% • grammar rules. Each one has • a nonterminal name followed by a colon • productions separated by vertical bar, possibly each with additional semantic actions and precedence information • a final semicolon • separator – %% • supporting C routines • there must at least be a lexical analyser named yylex • commonly accomplished by writing #include “lex.yy.c” where the lex program has been used to build the lexer. But can be hand-written. http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Simple Desk-Calculator example %{ #include <ctype.h> %} %token DIGIT %% line : expr ‘\n’ { printf(“%d\n”, $1); } ; expr : expr ‘+’ term { $$ = $1 + $3; } | term ; term : term ‘*’ factor { $$ = $1 * $3; } | factor ; factor : ‘(‘ expr ‘)’ { $$ = $2; } | DIGIT ; %% yylex() { int c; c=getchar(); if (isdigit(c)) {yylval=c-’0’; return DIGIT;} return c;} declares isdigit among others declares the token DIGIT for use in grammar rules and also in lexer code a semantic rule default semantic rule $$ = $1 is useful for single productions #include “lex.yy.c” here to use the yylex routine built by Lex lexer uses C variable ‘yylval’ to communicate attribute value http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Ambiguous grammars in Yacc • Yacc declarations allow for shift/reduce and reduce/reduce conflicts to be resolved using operator precedence and operator associativity information • Yacc does have default methods for resolving conflicts but it is considered wise to find out (using –v option) what conflicts arose and how they were resolved. • The declarations provide a way to override Yacc’s defaults • Productions have the precedence of their rightmost terminal, unless otherwise specified by %prec element • the declaration keywords %left, %right and %nonassoc inform Yacc that the tokens following are to be treated as left-associative (as binary + & * commonly are), right-associative (as binary – & / often are), or non-associative (as binary < & > often are) • the order of declarations informs yacc that the tokens should be accorded increasing precedence %left ‘+’ ‘-’ %left ‘*’ ‘/’ effect is that * has higher precedence than +, so x+y*z is grouped like x+(y*z) http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Semantic actions in Yacc • Each time the lexer returns a token, it can also produce an attribute value in the variable named yyval • Attribute values for nonterminals can also be produced by semantic actions • several C statements enclosed in { … } • $$ refers to attribute value for lhs nonterminal • $1, $2 etc refer to attribute values for successive rhs grammar symbols • Desk Calculator example uses only simple arithmentic operations. True compilers can have much more complex code in their productions’ semantic actions http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Bigger Desk-Calculator example %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double /* double type for Yacc stack */ %} %token NUMBER %left ‘+’ ‘-’ %left ‘*’ ‘/’ %right UMINUS %% lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ; expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ; %% #include “lex.yy.c” http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Bigger Desk-Calculator example %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double /* double type for Yacc stack */ %} %token NUMBER %left ‘+’ ‘-’ %left ‘*’ ‘/’ %right UMINUS %% lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ; expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ; %% #include “lex.yy.c” http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Bigger Desk-Calculator example %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double /* double type for Yacc stack */ %} %token NUMBER %left ‘+’ ‘-’ %left ‘*’ ‘/’ %right UMINUS %% lines : lines expr ‘\n\ ( printf(“%g\n”, $2); } | lines ‘\n’ | /* empty */ | error ‘\n’ { yyerror(“reenter previous line”); yyerrok; } ; expr : expr ‘+’ expr { $$ = $1 + $3; } | expr ‘-’ expr { $$ = $1 - $3; } | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr { $$ = $1 / $3; } | ‘(‘ expr ‘)’ { $$ = $2; } | ‘-’ expr %prec UMINUS { $$ = -$2; } | NUMBER ; %% #include “lex.yy.c” http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Error handling in Yacc-generated parsers • Rules may include error productions for selected nonterminals • stmt : b {…} | g {…} | d {…} | error a {…} • error is a Yacc reserved word • If the parser has no action for a combination of {state, input token}, then • it scans its stack for a state with a error production among its items • it pushes “error” onto its symbol stack • it scans input stream for a sequence reducible to a • which may be empty • it pushes all a onto its symbol stack • it reduces according to the error production • which may cause semantic actions to be carried out • often involving routines yyerror(msg) and yyerrok http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction
Some other free parser generatorssee eg www.thefreecountry.com/programming/compilerconstruction.html http://csiweb.ucd.ie/staff/acater/comp30330.html Compiler Construction