1 / 6

Lexical Analyzer

Lexical Analyzer. The main task of the lexical analyzer is to read the input source program, scanning the characters, and produce a sequence of tokens that the parser can use for syntactic analysis. The interface may be to be called by the parser to produce one token at a time

kirby
Download Presentation

Lexical Analyzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Analyzer • The main task of the lexical analyzer is to read the input source program, scanning the characters, and produce a sequence of tokens that the parser can use for syntactic analysis. • The interface may be to be called by the parser to produce one token at a time • Maintain internal state of reading the input program (with lines) • Have a function “getNextToken” that will read some characters at the current state of the input and return a token to the parser • Other tasks of the lexical analyzer include • Skipping white space and comments • Keeping track of line numbers for error reporting • Sometimes it can also produce the annotated lines for error reports • Produce the value of the token • Insert identifiers into the symbol table

  2. Character Level Scanning • The lexical analyzer needs to have a well-defined valid character set • Produce invalid character errors • Delete invalid characters from token stream so as not to be used in the parser analysis • E.g. don’t want invisible characters in error messages • For every end-of-line, keep track of line numbers for error reporting • Skip over white space and comments • If comments are nested (not common), must keep track of nesting to find end of comments • May produce hidden tokens, for convenience of scanner structure • Always produce an end-of-file token • Important that quoted strings and comments don’t get stuck if an unexpected end of file occurs

  3. Tokens, token types and values • The set of tokens is typically something like the following table • Or may have separate token types for different operators or reserved words • May want to keep line number with each token

  4. Token Actions • Each token recognized can have an action function • Many token types produce a value • In the case of numeric values, make sure property numeric errors produced, e.g. integer overflow • Put identifiers in the symbol table • Note that at this time, no effort is made to distinguish scope; there will be one symbol table entry for each identifier • Later, separate scope instances will be produced • Other types of actions • End-of-line (can be treated as a token type that doesn’t output to the parser) • Increment line number • Get next line of input to scan

  5. Testing • Execute lexical analyzer with test cases and compare results with expected results • Test cases • Exercise every part of lexical analyzer code • Produce every error message • Don’t have to be valid programs – just valid sequence of tokens

  6. Resources • Per Brinch Hansen, On Pascal Compilers, Prentice-Hall, 1985. Out of print. • Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (The red dragon book)

More Related