380 likes | 403 Views
Learn why lexical and syntax analyses are separate for simpler design, efficiency, and better portability. Explore tokens, patterns, lexemes, and operations on languages. Understand regular expressions, finite automata, and converting NFAs to DFAs.
E N D
Lexical Analysis • Why separate lexical and syntax analyses? • simpler design • efficiency • portability by Neng-Fa Zhou
Tokens, Patterns, Lexemes • Tokens • Terminal symbols in the grammar • Patterns • Description of a class of tokens • Lexemes • Words in the the source program by Neng-Fa Zhou
Languages • Fixed and finite alphabet (vocabulary) • Finite length sentences • Possibly infinite number of sentences • Examples • Natural numbers {1,2,3,...10,11,...} • Strings over {a,b} anban • Terms on parts of a string • prefix, suffix, substring, proper .... by Neng-Fa Zhou
Operations on Languages by Neng-Fa Zhou
Examples L = {A,B,...,Z,a,b,...,z} D = {0,1,...,9} L D : the set of letters and digits LD : a letter followed by a digit L4 : four-letter strings L* : all strings of letters, including e L(L D)* : strings of letters and digits beginning with a letter D+ : strings of one or more digits by Neng-Fa Zhou
Regular Expression(RE) • e is a RE • a symbol in S is a RE • Let r and s be REs. • (r) | (s) : or • (r)(s) : concatenation • (r)* : zero or more instances • (r)+ : one or more instances • (r)? : zero or one instance by Neng-Fa Zhou
Examples Precedence of Operators all left associative high S = {a,b} 1. a|b 2. (a|b)(a|b) 3. a* 4. (a|b)* 5. a| a*b r* r+ r? rs low r|s by Neng-Fa Zhou
Algebraic Properties of RE by Neng-Fa Zhou
Regular Definitions d1 r1 d2 r2 di is a RE over S {d1,d2,...,di-1} .... dn rn not recursive by Neng-Fa Zhou
Example-1 %{ int num_lines = 0, num_chars = 0; %} %% \n ++num_lines; ++num_chars; . ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } yywrap(){return 0;} by Neng-Fa Zhou
Example-2 D [0-9] INT {D}{D}* %% {INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);} . {printf("unrecognized %s\n",yytext);} %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } yywrap(){return 0;} by Neng-Fa Zhou
java.util.regex import java.util.regex.*; class Number { public static void main(String[] args){ String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?"; if (Pattern.matches(regExNum,args[0])) System.out.println("valid"); else System.out.println("invalid"); } } by Neng-Fa Zhou
Regex in Perl print "Input a string :"; $_ = <STDIN>; chomp($_); if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n"; } else { print "invalid\n"; } by Neng-Fa Zhou
Regex in Python import re import sys num = "[0-9]+(\.[0-9]+((e|E)(\+|\-)?[0-9]+)?)?" if len(sys.argv) != 2: print("Usage: python num string\n") res = re.fullmatch(num,sys.argv[1]) print(res) if res: print("valid") else: print("invalid") by Neng-Fa Zhou
Finite Automata • Nondeterministic finite automaton (NFA) NFA = (S, , T,s0,F) • S: a set of states • : a set of symbols • T: a transition mapping • s0: the start state • F: final states or accepting states by Neng-Fa Zhou
Example by Neng-Fa Zhou
Deterministic Finite Automata (DFA) T: a transition function There is only one arc going out from each node on each symbol. by Neng-Fa Zhou
Simulating a DFA s = s0; c = nextchar; while (c != eof) { s = move(s,c); if (s==error_s) break; c = nextchar; } if (s is in F) return "yes"; else return "no"; by Neng-Fa Zhou
From RE to NFA • e • a in S • s|t by Neng-Fa Zhou
From RE to NFA (cont.) • st • s* by Neng-Fa Zhou
Example (a|b)*a by Neng-Fa Zhou
Building Lexical Analyzer RE NFA DFA Algorithm 3.23 (Thompson's construction) Algorithm 3.32 (Subset construction) Emulator by Neng-Fa Zhou
Conversion of an NFA into a DFA • Intuition • move(s,a) is a function in a DFA • move(s,a) is a mapping in a NFA NFA DFA A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string. by Neng-Fa Zhou
Computation of e-Closure e-Closure(T): The set of NFA states that are reachable from state in T by e-transitions alone. by Neng-Fa Zhou
From an NFA to a DFA(The subset construction) by Neng-Fa Zhou
Example NFA DFA by Neng-Fa Zhou
Algorithm 3.39 P = {F, S-F}; do begin P0=P; for each group G in P do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in P by the set of all subgroups formed; end if (P == P0) return;; end; by Neng-Fa Zhou
Example a b AC B AC B B D D B E E B AC by Neng-Fa Zhou
Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou
Implementation Issues • Input buffering • Read in characters one by one • Unable to look ahead • Inefficient • Read in a whole string and store it in memory • Requires a big buffer • Buffer pairs by Neng-Fa Zhou
Buffer Pairs by Neng-Fa Zhou
Use Sentinels by Neng-Fa Zhou
Lexical Analyzer by Neng-Fa Zhou
Lex • A tool for automatically generating lexical analyzers by Neng-Fa Zhou
Lex Specifications declarations %% translation rules %% auxiliary procedures p1 {action1} p2 {action2} ... pn {actionn} by Neng-Fa Zhou
Lex Regular Expressions by Neng-Fa Zhou
yylex() yylex(){ switch (pattern_match()){ case 1: {action1} case 2: {action2} ... case n: {actionn} } } by Neng-Fa Zhou
Example DIGIT [0-9] ID [a-z][a-z0-9]* %% {DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));} {DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));} if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);} {ID} {printf("An identifier %s\n",yytext);} "+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);} "{"[^}\n]*"}" {/* eat up one-line comments */} [ \t\n]+ {/* eat up white space */} . {printf("Unrecognized character: %s\n", yytext);} %% int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } by Neng-Fa Zhou