160 likes | 453 Views
lex & yacc Tutorial Feb. 18, 2005. Outline. Overview of lex and yacc. Structure of lex specification Regular expression Structure of yacc specification Some hints of lab1. Scanner, parser, lex and yacc. symbol table. source program. Scanner. Parser. token. lex.yy.c. y.tab.c.
E N D
lex & yacc Tutorial Feb. 18, 2005
Outline • Overview of lex and yacc. • Structure of lex specification • Regular expression • Structure of yacc specification • Some hints of lab1
Scanner, parser, lex and yacc symbol table source program Scanner Parser token lex.yy.c y.tab.c Lex/flex Yacc/bison Lex spec (.l) yacc spec (.y)
Scanner, parser, lex and yacc (cont) “23+15” Scanner Parser token lex.yy.c intyylex(void) { … yytext=“23”; yylval = 23; return NUM; … } y.tab.c intyyparse(void){ while (ret = yylex()) { …. } } Lex/flex Yacc/bison Lex spec (.l) yacc spec (.y)
lex.yy.c is generated after running > lex x.l x.l %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [RULES SECTION] %% C auxiliary subroutines This part will be embedded into lex.yy.c substitutions, code and start states; will be copied into lex.yy.c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex(). Skeleton of a lex specification (.l file)
The rule of lex specification file Rule section is list of rules <pattern> { corresponding actions } <pattern> { corresponding actions } … … … [1-9][0-9]* { yylval = atoi (yytext); return NUMBER; } Actions are C statements Pattern in regular expr form
Two notes on using lex • Longest match: • e.g: input=“abc” pattern=“[a-z]+ {…}”, token= “abc” not “a” or “ab”.) • 2. more applicable rules, favor 1st e.g input = “post”, rule1 = “post” {print ( “hi”); },rule2 = “[a-zA-Z]+ {printf(“Hi!!!!!”), rule1 is applied.
Scanner, parser, lex and yacc symbol table source program Scanner Parser token lex.yy.c lex.yy.c Yacc/bison Lex spec (.l) yacc spec (.y)
Skeleton of a yacc specification (.y file) y.tab.c is generated after running > yacc x.y x.y %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [PRODUCTION RULES SECTION] %% < C auxiliary subroutines> This part will be embedded into y.tab.c contains token declarations. Tokens are recognized in lexer. define how to “understand” the input language, and what actions to take for each “sentence”. any user code. For example, a main function to call the parser function yyparse()
Production Rules Section of yacc Spec File A production Rule section is list of rules nontermsym : symbol1 symbol2 … { actions } | symbol3 symbol4 … { actions } | … ; Alternatives expr : expr ‘+’ expr { $$ = $1 + $3 } Value of non-terminal on lhs Value of n-th sym on rhs
An example of rule section statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | expression ‘*’ expression { $$ = $1 * $3; } | expression ‘/’ expression { $$ = $1 / $3 ; } | NUMBER { $$ = $1; } ; • It is ambiguous grammar (shift/reduce conflict)! think about how to parse the input “2+3*5”. • Resolve conflict without grammar transformation (next slide)
Definition section %left ‘+’ ‘-’ %left ‘*’ ‘/’ Higher prec operators are defined later Specify the associativity An example of rule section (cont) • Define operator’s precedence and associativityresolve shift/reduce conflict in previous slide
A case study – The Calculator zcalc.l zcalc.y %{ #include “zcalc.h” %} %union { double dval; struct symtab *symp; } %token <symp> NAME %token <dval> NUMBER %left ‘+’ ‘-’ %type <dval> expression %% statement_list : statement ‘\n’ | statement_list statement ‘\n’ statement : NAME ‘=‘ expression {$1->value = $3;} | expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } %% struct symtab * symlook( char *s ) { /* this function looks up the symbol table and check whether the symbol s is already there. If not, add s into symbol table. */ } int main() { yyparse(); return 0; } %{ #include “zcalc.tab.h” #include “y.tab.h” %} %% ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); return NUMBER; } [ \t] ; [a-zA-Z][a-zA-Z0-(]* { struct symtab *sp = symlook(yytext); yylval.symp = sp; return NAME; } %% Yacc –d zcalc.y
Hints to Lab #1(Exercise 2) 1: How to recognize “prefix”, “postfix” and “infix” in lexer? Add these rules to your .l file: %% “prefix” { return PREFIX;} “postfix” { return POSTFIX; } “infix” { return INFIX;} Remember to declare PREFIX, POSTFIX and INFIX as “token” in your .y file. 2. How to combine three modes together? Remember you have to use three sets of grammars for three modes, and a global variable to remember the state. For example, you may have the following grammar: prog = prog INFIX ‘\n’ in_stmts | prog POSTFIX ‘\n’ post_stmts | prog PREFIX ‘\n’ pre_stmts ; expression : infix_expr | post_expr | pre_expr; infix_expr: infix_expr ‘+’ infix_expr { if(flag == 0) $$ = $1 + $3;} | …… pre_expr : ‘+’ pre_expr pre_expr ( if( flag == 1) $$ = $2 + $3;} | …… ……
Hints to Lab1 (Exercise 4-5) 3. How to build up and print AST Now the statements and expressions in your grammar are no longer “double” type. As shown in the lab handout, the statement_list is a linked list, and expression is a tree structure like this: struct EXP{ struct EXP* exp1; struct EXP* exp2; struct OP operator; } Some functions are associated with the linked list and the tree structure, such as linkNewStatement(), makeExpression(struct EXP* exp1, struct EXP* exp2, struct OP operator), etc. Remember that the action field for each production in your yacc file can call any function you have declared. Just as a sentence is recursively parsed, your AST is recursively built-up and traversed. Another important point: building up the AST and printing out the AST can NOT be done in one pass!