1 / 15

Lexical Analyzer

Lexical Analyzer. Lecturer: Esti Stein brd4.ort.org.il/~esti2. What is a lexical analyzer?. Stream of (token, value) pairs. Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. Source Program. Lexical Analyzer. Symbol table.

livvy
Download Presentation

Lexical Analyzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Analyzer Lecturer: Esti Stein brd4.ort.org.il/~esti2 61102 Compilers Software Eng. Dept. – Ort Braude

  2. What is a lexical analyzer? Stream of (token, value) pairs Read in characters and group them into tokens. [most of the compilation time is spent on lexical analysis]. Source Program Lexical Analyzer Symbol table 61102 Compilers Software Eng. Dept. – Ort Braude

  3. Why using a lexical analyzer? • Modular design – partitioning the compiler to independent parts. • The parser is dealing with words (not characters). • Isolate character set dependencies: • ASCII versus EBCDIC • Isolate representation of symbols: • < > versus != , { } versus begin..end 61102 Compilers Software Eng. Dept. – Ort Braude

  4. A token is: A place holder for logical entity: • keywords • constants • operators • punctuation • Identifiers Not white spaces and comments. 61102 Compilers Software Eng. Dept. – Ort Braude

  5. Example of tokenizing if( val1 + val2 >= 6.5) todo = false; 61102 Compilers Software Eng. Dept. – Ort Braude

  6. Example [program]: token Getoken( ) { SkipWhiteSpace( ); c = getchar( ); if( isletter(c )) return( ScanForIdentifier( ) ); if( isdigit(c )) return( ScanForConstant( ) ); switch( c) { case ‘(‘: return( LEFT_PAREN); case ‘)‘: return( RIGHT_PAREN); case ‘+’: return( ScanForAddOrIncrement( )); case ‘=‘: return( ScanForAssignOrEqual( )); case ‘/’: return( ScanForCommentOrDivide( )); … default: return( ERROR); } } 61102 Compilers Software Eng. Dept. – Ort Braude

  7. Automating: Most tokens can be easily defined by a regular grammar: • the user defines tokens in a form equivalent to regular grammar • the system converts the grammar into code. Variety of tools – lex, flex .. 61102 Compilers Software Eng. Dept. – Ort Braude

  8. Regular Expressions & Automata See at the “Technion” tutorial – about automata. 61102 Compilers Software Eng. Dept. – Ort Braude

  9. Exercise 1: A real number consists of two parts: • The integer part, consisting of one or more digits. A number may not begin with a zero, unless the integer part is just zero. • The decimal part, consisting of a decimal point followed by one or more digits. Construct a regular expression for real numbers. 61102 Compilers Software Eng. Dept. – Ort Braude

  10. Converting an NDFA to a DFA Convert to DFA… 61102 Compilers Software Eng. Dept. – Ort Braude

  11. Converting an NDFA to a DFA[2] 61102 Compilers Software Eng. Dept. – Ort Braude

  12. The Code S: c = getchar( ); if( c = = ‘a’) goto SA; if( c = = ‘b’) goto S; error( ); SA: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto S; error( ); SAC: c = getchar( ); if( c = = ‘a’) goto SAC; if( c = = ‘b’) goto SBC; error( ); … 61102 Compilers Software Eng. Dept. – Ort Braude

  13. The Code[2] token LexicalDriver( LexTable) { state = laststate; for(;;) { c = NextChar( ); state = LexTable[ state, c]; if( state != error && state != finalstate) { AddToToken( c); AdvanceInput( ); } else break; } if( state != finalstate) return( ERROR); else return( Token[ finalstate]); } 61102 Compilers Software Eng. Dept. – Ort Braude

  14. Output Lexical Errors • A compiler produce a listing of the compiled program + error messages – near the locations of the errors. • The errors are queued and printed once a new-line is reached. • Two ways for recover: • Ignore erroneous token, and start new token. • Delete the 1st char. Read and start re-reading the input. (complicate!) • Be careful not to propagate error messages! 61102 Compilers Software Eng. Dept. – Ort Braude

  15. LEX – the Lexical Analyzer See at the “Technion” tutorial – about the Lex. 61102 Compilers Software Eng. Dept. – Ort Braude

More Related