550 likes | 1.31k Views
Lex and Yacc. COP - 3402. General Compiler Infra-structure. Syntactic Structure. Program source (stream of characters). Tokens. Parser. Semantic Routines. Scanner (tokenizer). Lex. Yacc. IR: Intermediate Representation (1). Analysis/ Transformations/ optimizations.
E N D
Lex and Yacc COP - 3402
General Compiler Infra-structure Syntactic Structure Program source (stream of characters) Tokens Parser Semantic Routines Scanner (tokenizer) Lex Yacc IR: Intermediate Representation (1) Analysis/ Transformations/ optimizations Symbol and Attribute Tables IR: Intermediate Representation (2) Code Generator Assembly code
Lex & Yacc • Lex • generates C code for the lexical analyzer (scanner) • Token patterns specified by regular expressions • Yacc • generates C code for a LR(1) syntax analyzer (parser) • BNF rules for the grammar
Lex • lex is a program (generator) that generates lexical analyzers, (widely used on Unix). • It is mostly used with Yacc parser generator. • Written by Eric Schmidt and Mike Lesk. • It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language. • Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers.
Example: Simple Calculator • Computes the basic arithmetic operations • Allows declaration of variables • Enough to illustrate the basic structure of Lex and Yacc programs
Lex program structure … definitions … %% … rules … %% … subroutines … %{ #include <stdio.h> #include "y.tab.h" int c; extern int yylval; %} %% " " ; [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } [^a-z0-9\b] { c = yytext[0]; return(c); }
Pattern Matching and Action Match a character in the a-z range Buffer [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } Place the offset c – ‘a’ In the stack Match a positive integer (sequence of 0-9 digits) Place the integer value In the stack
Yacc • Grammars described by rules using a variant of the Backus Naur Form (BNF) • Context-free grammars • LALR(1) parse table is generated automatically based on the rules • Actions are added to the rules and executed after each reduction
Yacc Program Structure %{ #include <stdio.h> int regs[26]; int base; %} %token NUMBER LETTER %left '+' '-‘ %left '*' '/‘ %% list: | list stat '\n' |list error '\n' {yyerrok;} ; stat: expr {printf("%d\n",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: '(' expr ')' {$$ = $2;} | expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} %% main(){return(yyparse());} yyerror(CHAR *s){fprintf(stderr, "%s\n",s);} yywrap(){ return(1);} … definitions … %% … rules … %% … subroutines …
Rule Reduction and Action Action Grammar rule stat: expr {printf("%d\n",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} “or” operator: For multiple RHS
Further reading… • “A Compact Guide to Lex & Yacc”, Thomas Niemann (recommended); • “Lex & Yacc”, Doug Brown (O’Reily); • Lots of resources on the web • Check our website for some suggestions
Conclusions • Yacc and Lex are very helpful for building the compiler front-end • A lot of time is saved when compared to hand-implementation of parser and scanner • They both work as a mixture of “rules” and “C code” • C code is generated and is merged with the rest of the compiler code