190 likes | 204 Views
Learn how to use JLex, a Java-based lexical analyzer generator. Explore input file structure, directives, token types, and regex rules. See examples and running instructions.
E N D
JLex Lecture 4 Mon, Jan 26, 2004
JLex • JLex is a lexical analyzer generator in Java. • It is based on the well-known lex, which is a lexical analyzer generator in C. • JLex reads a description of a set of tokens and outputs a Java program that will process those tokens.
The JLex Input File • The input file to JLex uses the extension .lex. • The file is divided into three parts. • User code • JLex directives • Regular expression rules • These three sections are separated by %%.
JLex User Code • See Section 2.1 of the JLex User’s Manual. • Any code written in the user-code section is copied directly into the Java source file created by JLex. • JLex creates a class named Yylex, which is at the heart of the lexer. The user code is not incorporated into this class.
JLex Directives • See Section 2.2 of the JLex User’s Manual. • Any code bracketed within %{ and %} is copied directly into the Yylex class, at the beginning. • Although this code is incorporated into the Yylex class, it is not incorporated into any Yylex member function. • Thus, we may define Yylex class variables or additional member functions.
The init Directive • Code bracketed within %init{ and %init} is copied into the Yylex default constructor, which is called on by the other constructors. %init{ System.out.println("In the constructor"); %init}
The eof Directive • Code bracketed within %eof{ and %eof} is copied into the Yylex function yy_do_eof(), which is called once upon end of file. %eof{ System.out.println("In yy_do_eof()"); %eof}
JLex Token Types • Unless we specify otherwise, the data type of the returned tokens is Yytoken. • This class is not created automatically. • We may change the return type to int by typing the directive %integer. • We may change the return type to Integer by typing the directive %intwrap. • We may set the return type to any other type by using the directive %type.
JLex Token Types • If the return type is Yytoken or Integer, then the EOF token is null. • If the return type is int, then the EOF token is -1. • For any other type, we need to specify the EOF value.
JLex EOF Value • By using the %eofval directive, we may indicate what value to return upon EOF. • We write %eofval{ returnnewtype(value); %eofval}
JLex Regular Expression Rules • Each regular expression rule consists of a regular expression followed by an associated action. • The associated action is a segment of Java code, enclosed in braces { }. • Typically, the action will be to return the appropriate token.
JLex Regular Expressions • Regular expressions are expressed using ASCII characters (0 – 127). • The following characters are metacharacters. ? * + | ( ) ^ $ . [ ] { } “ \ • Metacharacters have special meaning; they do not represent themselves. • All other characters represent themselves.
JLex Regular Expressions • Let r and s be regular expressions. • r? matches zero or one occurrences of r. • r* matches zero or more occurrences of r. • r+ matches one or more occurrences of r. • r|s matches r or s. • rs matches r concatenated with s.
JLex Regular Expressions • Parentheses are used for grouping. ("+"|"-")? • If a regular expression begins with ^, then it is matched only at the beginning of a line. • If a regular expression ends with $, then it is matched only at the end of a line. • The dot . matches any non-newline character.
JLex Regular Expressions • Brackets [] match any single character listed within the brackets. • [abc] matches a or b or c. • [A-Za-z] matches any letter. • If the first character after [ is ^, then the brackets match any character except those listed. • [^A-Za-z] matches any nonletter.
JLex Regular Expressions • A single character within double quotes " " represents itself. • Metacharacters lose their special meaning and represent themselves when they stand alone within single quotes. • "?" matches ?.
JLex Escape Sequences • Some escape sequences. • \n matches newline. • \b matches backspace. • \r matches carriage return. • \t matches tab. • \f matches formfeed. • If c is not a special escape-sequence character, then \c matches c.
Running JLex • The lexical analyzer generator is the Main class in the JLex folder. • To create a lexical analyzer from the file filename.lex, type java JLex.Main filename.lex • This produces a file filename.lex.java, which must be compiled to create the lexical analyzer.
Running the Lexical Analyzer • To run the lexical analyzer, a Yylex object must first be created. • The Yylex constructor has one parameter specifying a input stream. • For example Yylex lexer = new Yylex(System.in); • Then, calls to the yylex() member function will return tokens. token = lexer.yylex();