200 likes | 611 Views
Flex. Flex. A Lexical Analyzer Generator generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using flex Create a description or rules file for flex to operate on
E N D
Flex • A Lexical Analyzer Generator • generates a scanner procedure directly, with regular expressions and user-written procedures • Steps to using flex • Create a description or rules file for flex to operate on • Run flex on the input file. flex produces a C file called lex.yy.c with the scanning function yylex(). • Run the C compiler on the C file to produce a lexical analyzer
Flex Files and Procedure Scanner in c code Rule file *.l Flex compiler lex.yy.c C compiler -lfl lex.yy.c scanner.exe Test file scanner.exe tokens
Flex Programs The flex input file consists of three sections separated by a line with just %% %{auxiliary declarations%}regular definitions%%translation rules%%auxiliary procedures
Regular Expression Definitions Section • The definitions section contains declarations of simple name definitions to simplify the scanner specification. • Name definitions have the form: name definition • Example: DIGIT [0-9] ID [a-z][a-z0-9]*
Translation Rules Section P1 action1 P2 action2 ... Pn actionn where Pi are regular expressions and actioni are C program segments
Auxiliary Procedure Section • is simply copied to lex.yy.c. • this section is optional; • if it is missing, the second %% in the input file may be skipped. • In the definitions and rules sections, any indented text or text • enclosed in %{ and %} • is copied to the output (with the %{}'s removed).
Rules • Look for the longest token • number • Look for the first-listed pattern that matches the longest token • keywords and identifiers • List frequently occurring patterns first • white space
Rules • View keywords as exceptions to the rule of identifiers • construct a keyword table • Lookahead operator: r1/r2 - match a string in r1 only if followed by a string in r2 • DO 5 I = 1. 25DO 5 I = 1, 25DO/({letter}|{digit})* = ({letter}|{digit})*,
Functions and Variables • yylex() • a function implementing the lexical analyzer and returning the token matched • yytext • a global pointer variable pointing to the lexeme matched • yyleng • a global variable giving the length of the lexeme matched • yylval • an external global variable storing the attribute of the token
Example %{ #define EOF 0 #define LE 25 ... %} delim [ \t\n] ws {delim}+ letter [A-Za-z] digit [0-9] id {letter}({letter}|{digit})* number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)? %%
Example {ws} { /* no action and no return */ } if {return (IF);} else {return (ELSE);} {id} {yylval=install_id(); return (ID);} {number} {yylval=install_num(); return (NUMBER);} “<=” {yylval=LE; return (RELOP);} “==” {yylval=EQ; return (RELOP);} ... <<EOF>> {return(EOF);} %% install_id() { ... } install_num() { ... }
Lexical Error Recovery • Error: none of patterns matches a prefix of the remaining input • Panic mode error recovery • delete successive characters from the remaining input until the pattern-matching can continue • Error repair: • delete an extraneous character • insert a missing character • replace an incorrect character • transpose two adjacent characters
Maintaining Line Number • Flex allows to maintain the number of the current line in the global variable yylineno using the following option mechanism%option yylinenoin the first section
Flex : Regular Expression x match the character 'x' . any character (byte) except newline [xyz] a "character class"; in this case, the pattern matches either an 'x', a 'y', or a 'z' [abj-oZ] a "character class" with a range in it; matches an 'a', a 'b', any letter from 'j' through 'o', or a 'Z' [^A-Z] a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter. [^A-Z\n] any character EXCEPT an uppercase letter or a newline
Flex : Regular Expression r* zero or more r's, where r is any regular expression r+ one or more r's r? zero or one r's (that is, "an optional r") r{2,5} anywhere from two to five r's r{2,} two or more r's r{4} exactly 4 r's {name} the expansion of the "name" definition (see above) "[xyz]\"foo“ the literal string: [xyz]"foo \X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', then the ANSI-C interpretation of \x. Otherwise, a literal 'X' (used to escape operators such as '*')
Flex : Regular Expression \0 a NUL character (ASCII code 0) \123 the character with octal value 123 \x2a the character with hexadecimal value 2a (r) match an r; parentheses are used to override precedence (see below) rs the regular expression r followed by the regular expression s; called "concatenation" r|s either an r or an s ^r an r, but only at the beginning of a line (i.e., which just starting to scan, or right after a newline has been scanned). r$ an r, but only at the end of a line (i.e., just before a newline). Equivalent to "r/\n".
Execute Flex • Create a directory in cygwin • Example /usr/src/compiler • Downalod calc.l or c.l • Execute flex • Flex calc.l • Lex.yy.c • will be generated