130 likes | 333 Views
CSC 3315 Lexical and Syntax Analysis. Hamid Harroud School of Science and Engineering, Akhawayn University http://www.aui.ma/~H.Harroud/csc3315/. Constructing a Lexical Analyzer. state = S // S is the start state repeat { k = next character from the input
E N D
CSC 3315Lexical and Syntax Analysis HamidHarroud School of Science and Engineering, Akhawayn University http://www.aui.ma/~H.Harroud/csc3315/
Constructing a Lexical Analyzer state = S // S is the start state repeat { k = next character from the input if k == EOF // the end of input if state is a final state then accept else reject state = T[state,k] if state = empty then reject // got stuck }
Constructing a Lexical Analyzer intLexAnalyzer() { getChar(); if (isLetter(nextChar)) { addChar(); getChar(); while (isLetter(nextChar) || isDigit(nextChar)) { addChar(); getChar(); } return lookup(lexeme); } . . .
Constructing a Lexical Analyzer int LexAnalyzer() { getChar(); if (isLetter(nextChar)) { . . . } else if (isDigit(nextChar)) { addChar(); getChar(); while (isDigit(nextChar)) { addChar(); getChar(); } return INT_LIT; break; } }
Lexical Errors Consider the following two programs:
Jlex: a scanner generator generated scanner xxx.jlex.java jlex specificationxxx.jlex JLex.Main(java) javac xxx.jlex.java Yylex.class input programtest.sim P.main(java) Output of P.main Yylex.class
public class P { public static void main(String[] args) { FileReader inFile = new FileReader(args[0]); Yylex scanner = new Yylex(inFile); Symbol token = scanner.next_token(); while (token.sym != sym.EOF) { switch (token.sym) { case sym.INTLITERAL: System.out.println("INTLITERAL (" + ((IntLitTokenVal)token.value).intVal \ + ")"); break; … } token = scanner.next_token(); } } Jlex: a scanner generator
Regular expression rules regular-expression { action } pattern to be matched code to be executed when the pattern is matched When next_token() method is called, it repeats: Find the longest sequence of characters in the input (starting with the current character) that matches a pattern. Perform the associated action until a return in an action is executed.
Matching rules If several patterns that match the same sequence of characters, then the longest pattern is considered to be matched. If several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched so the order of the patterns can be important! If an input character is not matched in any pattern, the scanner throws an exception
An Example %% DIGIT= [0-9] LETTER= [a-zA-Z] WHITESPACE= [ \t\n] // space, tab, newline {LETTER}({LETTER}|{DIGIT}*) {System.out.println(yyline+1 + ": ID " + yytext());} {DIGIT}+ {System.out.println(yyline+1 + ": INT");} "=" {System.out.println(yyline+1 + ": ASSIGN");} "==" {System.out.println(yyline+1 + ": EQUALS");} {WHITESPACE}* { } . {System.out.println(yyline+1 + ": bad char");}