150 likes | 470 Views
Lexical Analyzer. Lecturer: Esti Stein brd4.ort.org.il/~esti2. What is a lexical analyzer?. Stream of (token, value) pairs. Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. Source Program. Lexical Analyzer. Symbol table.
E N D
Lexical Analyzer Lecturer: Esti Stein brd4.ort.org.il/~esti2 61102 Compilers Software Eng. Dept. – Ort Braude
What is a lexical analyzer? Stream of (token, value) pairs Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. Source Program Lexical Analyzer Symbol table 61102 Compilers Software Eng. Dept. – Ort Braude
Why using a lexical analyzer? • Modular design – partitioning the compiler to independent parts. • The parser is dealing with words (not characters). • Isolate character set dependencies: • ASCII versus EBCDIC • Isolate representation of symbols: • < > versus != , { } versus begin..end 61102 Compilers Software Eng. Dept. – Ort Braude
A token is: A place holder for logical entity: • keywords • constants • operators • punctuation • Identifiers Not white spaces and comments. 61102 Compilers Software Eng. Dept. – Ort Braude
Example of tokenizing if( val1 + val2 >= 6.5) todo = false; 61102 Compilers Software Eng. Dept. – Ort Braude
Example [program]: token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); } } 61102 Compilers Software Eng. Dept. – Ort Braude
Automating: Most tokens can be easily defined by a regular grammar: • the user defines tokens in a form equivalent to regular grammar • the system converts the grammar into code. Variety of tools – lex, flex .. 61102 Compilers Software Eng. Dept. – Ort Braude
Regular Expressions & Automata See at the “Technion” tutorial – about automata. 61102 Compilers Software Eng. Dept. – Ort Braude
Exercise 1: A real number consists of two parts: • The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero. • The decimal part, consisting of a decimal point followed by one or more digits. Construct a regular expression for real numbers. 61102 Compilers Software Eng. Dept. – Ort Braude
Converting an NDFA to a DFA Convert to DFA… 61102 Compilers Software Eng. Dept. – Ort Braude
Converting an NDFA to a DFA[2] 61102 Compilers Software Eng. Dept. – Ort Braude
The Code S: c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); SA: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); SAC: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); … 61102 Compilers Software Eng. Dept. – Ort Braude
The Code[2] token LexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); } 61102 Compilers Software Eng. Dept. – Ort Braude
Output Lexical Errors • A compiler produce a listing of the compiled program + error messages – near the locations of the errors. • The errors are queued and printed once a new-line is reached. • Two ways for recover: • Ignore erroneous token, and start new token. • Delete the 1st char. Read and start re-reading the input. (complicate!) • Be careful not to propagate error messages! 61102 Compilers Software Eng. Dept. – Ort Braude
LEX – the Lexical Analyzer See at the “Technion” tutorial – about the Lex. 61102 Compilers Software Eng. Dept. – Ort Braude