130 likes | 214 Views
lex (1) and flex(1). Lex public interface. FILE * yyin ; /* set before calling yylex () */ int yylex (); /* call once per token */ char yytext []; /* chars matched by yylex () */ int yywrap (); /* end-of-file handler */. .l file format. header %% body %% helper functions.
E N D
Lex public interface • FILE *yyin; /* set before calling yylex() */ • intyylex(); /* call once per token */ • char yytext[]; /* chars matched by yylex() */ • intyywrap(); /* end-of-file handler */
.l file format header %% body %% helper functions
Lex header • C code inside %{ … %} • prototypes for helper functions • #include’s that #define integer token categories • Macro definitions, e.g. letter [a-zA-Z] digit [0-9] ident {letter}({letter}|{digit})* • Warning: macros are fraught with peril
Lex body • Regular expressions with semantic actions “ “ { /* discard */ } {ident} { return IDENT; } “*” { return ASTERISK; } “.” { return PERIOD; } • Match the longest r.e. possible • Break ties with whichever appears first • If it fails to match: copy unmatched to stdout
Lex helper functions • Follows rules of ordinary C code • Compute lexical attributes • Do stuff the regular expressions can’t do • Write a yywrap() to switch files on EOF
Lex regular expressions • \c escapes for most operators • “s” match C string as-is (superescape) • r{m,n} match r between m and n times • r/s match r when s follows • ^r match r when at beginning of line • r$ match r when at end of line
struct token struct token { int category; char *text; intlinenumber; int column; char *filename; union literal value; }
“string removal tool” %% “zap me”
whitespace trimmer %% [ \t]+ putchar(‘ ‘); [ \t]+ /* drop entirely */
string replacement %% username printf(“%s”, getlogin() );
Line/word counter int lines=0, chars=0; %% \n ++lines; ++chars; . ++chars; %% main() { yylex(); printf(“lines: %d chars: %d\n”, lines, chars); }
Example: C reals • Is it: [0-9]*.[0-9]* • Is it: ([0-9]+.[0-9]* | [0-9]*.[0-9]+)