150 likes | 243 Views
Implementing lexical analyzer using finite automation. We are given the following regular definition: if -> if then -> then else -> else relop -> <| <=|=|<>|>|>= id -> letter( letter|digit )* num -> digit + (.digit + )? (E(+|-)?digit + )? letter -> [a-z]|[A-Z]
E N D
We are given the following regular definition: if -> if then -> then else -> else relop -> <| <=|=|<>|>|>= id -> letter(letter|digit)* num -> digit+(.digit+)? (E(+|-)?digit+)? letter -> [a-z]|[A-Z] digit ->[0-9]
Recognize the keyword: if, then, else and lexemes: relop, id, num • delim -> blank|tab|newline ws -> delim+ if a match for ws is found lexical analyzer does not return a token to parser. It proceeds to find a token following the white space and return that to parser.
Transition diagrams • Transition diagram depicts the actions that takes place when a lexical analyzer is called by parser to get the next token • TD keeps track of information about characters that are seen as fwd pointer scans the input • Position in TD are drawn as circles called states • States are connected by arrows called edges • Edges leaving state s have labels indicating i/p characters that can next appear after transition diagram have reached state s.
letter/digit * letter start delimiter 1 0 2 • Start state: state where control resides when we begin to recognize a token. • No valid transitions indicate failure • Accepting state: state in which token can be found. • * indicates state in which retraction must takes place
There may be several transition diagrams • If failure occurs while following one transition diagram, then retract the fwd pointer to where it was in start state of this diagram and activate next transition diagram • If failure occurs in all transition diagrams, lexical error will be detected and error recovery routines will be invoked • e.g. DO 5 I=1.25 DO 5 I=1,25
Recognition of reserved words • Initialize appropriately the symbol table in which information about identifiers is stored • Enter the reserved words into symbol table before any characters in the i/p are seen. • Make a note in the symbol table of the token to be returned when the keyword is identified. • Return statement next to accepting state uses gettoken()and install_id() to obtain token and attribute value • When a lexeme is identified, symbol table is checked • if found as keywordinstall_id() will return 0 • If an identifier , pointer to symbol table entry will be returned • gettoken() will return the corresponding token
Recognition of numbers • When accepting state is reached, • call a procedure install_num() that enters the lexeme into table of numbers and returns a pointer to created entry • Returns the token NUM
Implementing lexical analyzer • Token nexttoken( ) • { • While (1) • { • switch(state) { • case 0: c=nextchar(); • If (c==blank|| c==tab|| c==newline) { • State =0; • lexeme_beginning++; • } • else if (c==’<’) state=1; • else if (c ==’=’)state=5; • else if (c==’>’) state=6; • else state=fail(); • break; • case 1: c= nextchar(); • if (c==’=’) state=2; • else if (c==’>’) state=3; • else state=4; • break; • case 2: token.attribute=LE; • token.name=relop; • return token;
case 8: retract (1); • token.attribute=GT; • token.name=relop; • return token; • case 9: c= nextchar(); • if (isletter(c)) state=10; • else state= fail(); • break; • case 10: c= nextchar(); • if (isletter(c)) state=10; • else if (isdigit(c)) state=10; • else state=11; • break; • case11: retract (1); • entry=install_id( ); • name=gettoken(); • token.name= name; • token. attribute=entry; • return token; • break; • /* cases 12-24 here for numbers*/
case 25: c= nextchar(); • if (isidgit(c)) state=26; • else state=fail(); • break; • case 26: c= nextchar(); • if (isidgit(c)) state=26; • else state=27; • break; • case 27:retract (1); install_num( ); • return (NUM); • } • } • }
Code for next state • int state=0, start=0; • intlexical_value; • int fail() • { • forward=token_beginning; • switch( start){ • case 0:start=9; break; • case 9: start=12; break; • case 12: start=20; break; • case 20: start=25; break; • case 25: recover( ); break; • default: /* compiler error*/ • } • return start; • }