260 likes | 564 Views
Yacc. BNF grammar example. y. example.tab.c. YACC. C compiler +linker. Executable. Other modules. Yacc: what is it?.
E N D
Yacc BNF grammar example.y example.tab.c YACC C compiler +linker Executable Other modules
Yacc: what is it? Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file). The grammars accepted are LALR(1) grammars with disambiguating rules. A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.
Structure of Yacc • Usually Lex/Yacc work together • yylex(): to get the next token • To call the parser, the function yyparse()is invoked
How the parser works • The parser produced by Yacc consists of a finite state machine with a stack • A move of the parser is done as follows: • Calls to yylex to obtain the next token when needed • Using the current state, and the lookahead token, the parser decides on its next action (shift, reduce, accept or error) and carries it out
Skeleton of a yacc specification (.y file) {declarations} %% {rules} %% {user code} Rules: <production> action Grammar type 2 productions Action: C code that specifies what to do when a production is reduced
Skeleton of a yacc specification (.y file) %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [PRODUCTION RULES SECTION] %% < C auxiliary subroutines> This part will be embedded into *.c contains token declarations. Tokens are recognized in lexer. define how to “understand” the input language, and what actions to take for each “sentence”. any user code. For example, a main function to call the parser function yyparse()
Structure of yacc file • Definition section • declarations of tokens • type of values used on parser stack • Rules section • list of grammar rules with semantic routines • User code
The declaration section • Terminal and non terminals %token symbol %type symbol • Operator precedence and operator associability %noassoc symbol %left symbolo %right symbol • Axiom %start symbol
The declaration section: terminals • They are returned by the yylex()functionwhich is called be the yyparse() • They become #define in the generated file • They are numbered starting from 257. But a concrete number can be associated with a token • %token T_Key 345 • Terminals that consist of a single character can be directly used (they are implicit). The corresponding tokens have values <257
The declaration section:examples expressions.y %{ #include <stdio.h> %} %token NUMBER, PLUS, MINUS, MUL, DIV, L_PAR, R_PAR %start expr …
The declaration section:examples patterns.l %{ #include "expressions_tab.h" %} digit [0-9] %% [ \t]+ ; {digit}+ {yylval=atoi(yytext); return NUMBER;} "+" return PLUS; "-" return MINUS; "*" return MUL; "/" return DIV; "(" return L_PAR; ")" return R_PAR; . {printf("token erroneous\n");}
The declaration section:examples . . . %token NUMBER, +, -, *, /, (, ) . . . YACC: . . . digit [0-9] %% [ \t]+ ; {digit}+ {yylval=atoi(yytext); return NUMBER;} "+" return ’+’; "-" return ’-’; "*" return ’*’; "/" return ’/’; "(" return ’(’; ")" return ’)’; . . . Lex:
Flex/Yacc communication file.l file.y header lex file.l yacc -d file.y file.tab.h lex.yy.c file.tab.c cc lex.yy.c -c cc file.tab.c -c lex.yy.o file.tab.o gcc lex.yy.o file.tab.o -o calc calc
Lex/Yacc: lex file %{ #include "expressions.tab.h" %} digit [0-9] %option noyywrap %% [ \t]+ ; {digito}+ {yylval=atoi(yytext); /*printf("lex: %s, %d\n ",yytext, yylval);*/ return NUMERO;} "+" return PLUS; "-" return MINUS; . {printf("token erroneous\n");} %% Generated by Yacc no main()
Flex/Yacc communication • expressions.tab.h • #ifndef YYSTYPE • #define YYSTYPE int • #endif • #define NUMBER 258 • #define PLUS 259 • #define MINUS 260 • #define MUL 261 • #define DIV 262 • #define L_PAR 263 • #define R_PAR 264
The Production Rules Section %% production : symbol1 symbol2 … { action } | symbol3 symbol4 … { action } | … production: symbol1 symbol2 { action } %%
statement expression expression expression number expression expression number expression expression number number + 5 4 - + 3 2 Semantic values %% statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | NUMBER { $$ = $1; } %% According these two productions, 5 + 4 – 3 + 2 is parsed into:
Defining Values expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ;
Defining Values expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $1
Defining Values expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $2
Defining Values expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } | ID | NUM ; $3 Default: $$ = $1;
The declaration section • Support for arbitrary value types %union{ int intval; char *str; }
The declaration section • Use of union • terminal declaration %token <intval> NATURAL • non terminal declaration %type <type> NO_TERMINAL • in productions expr: NAT ´+´ NAT {$$=$<intval>1+$<intval>3}; • In the lex file [-+]?{digit}+ { yyval.intval=atoi(yytext); return INTEGER;}
Ambiguity • By default yacc does the following: • s/r: chooses reduce over shift • r/r: reduce the production that appears first • Better to solve the conflicts by setting precedence
Error recovery • Yacc detects errors • To inform of errors a function needs to be implemented int yyerror (char *s) {fprintf (stderr, “%s”,s)}; • Panic mode recovery E: IF ´(´ cond ´)´ | IF ´(´ error ´)´ {yyerror(“condition missing”);
Error recovery • After detecting an error, the parser will scan ahead looking for three legal tokens. yyerrork resets the parser to its normal mode • yyclearin allows the token that caused the error to be discarded