160 likes | 537 Views
Design of lexical analyzer using LEX. Lex helps to specify lexical analyzers by specifying regular expression i /p notation for lex tool is lex language and the tool itself is refered to as lex compiler
E N D
Lex helps to specify lexical analyzers by specifying regular expression • i/p notation for lex tool is lex language and the tool itself is refered to as lex compiler • Lex compiler transform i/p patterns into transition diagram and generates the code in a file lex.yy.c
Use of lex Fig. Creating lexical analyzer using lex
i/p file is lex.l, describes the lexical analyzer to be generated • lex.l runs through lex compiler to produce a C program lex.yy.c that simulate transition diagram for i/p patterns • lex.yy.c is compiled by C compiler to produce a.out, the working lexical analyzer that can produce stream of tokens from sequence of i/p characters • a.out is a subroutine to parser. It returns an integer, which is a code for one of the possible token names • Attribute value is placed in a global variable yylval, which is shared by lexical analyzer and parser
Lex specifications • The general format of Lex source is: {declarations} %% {translation rules} %% {user subroutines} • Declaration: of variables, manifest constants( identifiers declared to stand for a constant), and regular definition
Translation rules: pattern {action} pattern-> regular expression that may use regular definitions of declaration section action -> action lexical analyzer should take when pattern pi matches a lexeme • user subroutines: auxiliary procedures needed by action
When called by parser , lexical analyzer • Begins reading its remaining i/p until it finds the longest prefix of i/p that matches one of the patterns Pi • Then execute associated action Ai (Typically Ai will return to the parser. If it does not, lexical analyzer proceeds to find additional lexemes until one of the corresponding actions causes a return to the parser) • lexical analyzer returns a single value , token name . • The shared integer variable yylval is used to pass additional information about the lexeme found
IntInstallID() { /*function to install lexeme, whose 1st character is pointed to by yytext, and whose length is yyleng, into the symbol table and returns a pointer to there */ } • IntinstallNum() {/* similar to InstallID() but puts numerical constants into separate table*/ }
The actions taken when id is matched are: • InstallID() is called to place the lexeme found in the symbol table • This function returns a pointer to symbol table, which is placed in global variable yylval, where it can be used by parser or a later component of compiler. installID() has two variables available to it that are set automatically by lexical analyzer generated by Lex: • yytext : pointer to beginning of lexeme • yyleng: length of lexeme found • token name ID is returned to parser • action taken when lexeme matching number is similar
Conflict resolution in lex • When several prefixes of the i/p match one or more patterns : • Prefer the longest prefix to a shorter prefix e.g: <= to < • If longest possible prefix matches two or more patterns, prefer the pattern listed first in Lex program
The lookahead operator • Lex automatically reads one character ahead of last character that forms the lexeme, and then retract the i/p so the lexeme itself is taken from i/p stream • Sometimes we want certain pattern to be matched to i/p only when it is followed by certain other characters • Use / in the pattern to indicate the end of the part of pattern that matches the lexeme • What follows / is additional pattern that must be matched before we can decide that token in question is seen. What matches second pattern is not part of the lexeme
e.g. IF(I,J)=3 (IF array name) IF (condition) THEN…. IF / \( .* \) {letter}