1 / 19

Understanding JLex: A Java Lexical Analyzer Generator

Learn how to use JLex, a Java-based lexical analyzer generator. Explore input file structure, directives, token types, and regex rules. See examples and running instructions.

Download Presentation

Understanding JLex: A Java Lexical Analyzer Generator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JLex Lecture 4 Mon, Jan 26, 2004

  2. JLex • JLex is a lexical analyzer generator in Java. • It is based on the well-known lex, which is a lexical analyzer generator in C. • JLex reads a description of a set of tokens and outputs a Java program that will process those tokens.

  3. The JLex Input File • The input file to JLex uses the extension .lex. • The file is divided into three parts. • User code • JLex directives • Regular expression rules • These three sections are separated by %%.

  4. JLex User Code • See Section 2.1 of the JLex User’s Manual. • Any code written in the user-code section is copied directly into the Java source file created by JLex. • JLex creates a class named Yylex, which is at the heart of the lexer. The user code is not incorporated into this class.

  5. JLex Directives • See Section 2.2 of the JLex User’s Manual. • Any code bracketed within %{ and %} is copied directly into the Yylex class, at the beginning. • Although this code is incorporated into the Yylex class, it is not incorporated into any Yylex member function. • Thus, we may define Yylex class variables or additional member functions.

  6. The init Directive • Code bracketed within %init{ and %init} is copied into the Yylex default constructor, which is called on by the other constructors. %init{ System.out.println("In the constructor"); %init}

  7. The eof Directive • Code bracketed within %eof{ and %eof} is copied into the Yylex function yy_do_eof(), which is called once upon end of file. %eof{ System.out.println("In yy_do_eof()"); %eof}

  8. JLex Token Types • Unless we specify otherwise, the data type of the returned tokens is Yytoken. • This class is not created automatically. • We may change the return type to int by typing the directive %integer. • We may change the return type to Integer by typing the directive %intwrap. • We may set the return type to any other type by using the directive %type.

  9. JLex Token Types • If the return type is Yytoken or Integer, then the EOF token is null. • If the return type is int, then the EOF token is -1. • For any other type, we need to specify the EOF value.

  10. JLex EOF Value • By using the %eofval directive, we may indicate what value to return upon EOF. • We write %eofval{ returnnewtype(value); %eofval}

  11. JLex Regular Expression Rules • Each regular expression rule consists of a regular expression followed by an associated action. • The associated action is a segment of Java code, enclosed in braces { }. • Typically, the action will be to return the appropriate token.

  12. JLex Regular Expressions • Regular expressions are expressed using ASCII characters (0 – 127). • The following characters are metacharacters. ? * + | ( ) ^ $ . [ ] { } “ \ • Metacharacters have special meaning; they do not represent themselves. • All other characters represent themselves.

  13. JLex Regular Expressions • Let r and s be regular expressions. • r? matches zero or one occurrences of r. • r* matches zero or more occurrences of r. • r+ matches one or more occurrences of r. • r|s matches r or s. • rs matches r concatenated with s.

  14. JLex Regular Expressions • Parentheses are used for grouping. ("+"|"-")? • If a regular expression begins with ^, then it is matched only at the beginning of a line. • If a regular expression ends with $, then it is matched only at the end of a line. • The dot . matches any non-newline character.

  15. JLex Regular Expressions • Brackets [] match any single character listed within the brackets. • [abc] matches a or b or c. • [A-Za-z] matches any letter. • If the first character after [ is ^, then the brackets match any character except those listed. • [^A-Za-z] matches any nonletter.

  16. JLex Regular Expressions • A single character within double quotes " " represents itself. • Metacharacters lose their special meaning and represent themselves when they stand alone within single quotes. • "?" matches ?.

  17. JLex Escape Sequences • Some escape sequences. • \n matches newline. • \b matches backspace. • \r matches carriage return. • \t matches tab. • \f matches formfeed. • If c is not a special escape-sequence character, then \c matches c.

  18. Running JLex • The lexical analyzer generator is the Main class in the JLex folder. • To create a lexical analyzer from the file filename.lex, type java JLex.Main filename.lex • This produces a file filename.lex.java, which must be compiled to create the lexical analyzer.

  19. Running the Lexical Analyzer • To run the lexical analyzer, a Yylex object must first be created. • The Yylex constructor has one parameter specifying a input stream. • For example Yylex lexer = new Yylex(System.in); • Then, calls to the yylex() member function will return tokens. token = lexer.yylex();

More Related