1 / 8

Two issues in lexical analysis Specifying tokens (regular expression) – last lecture

Learn how regular expressions define tokens and how to recognize them with recognizers. Explore deterministic and non-deterministic finite automata for language recognition.

francor
Download Presentation

Two issues in lexical analysis Specifying tokens (regular expression) – last lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two issues in lexical analysis • Specifying tokens (regular expression) – last lecture • Identifying tokens specified by regular expression – today’s topic

  2. How to recognize tokens specified by regular expressions? • A recognizer for a language is a program that takes a string x as input and answers “yes” if x is a sentence of the language and “no” otherwise. • In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answer “yes” if the string is in the language. • How to recognize regular exp int? What about int | for? 2 3 1 0 2 3 1 i n t 0 i n t All other characters error

  3. How to recognize tokens specified by regular expressions? • How to recognize regular expression int | for? • A regular expression can be compiled into a recognizer (automatically) by constructing a finite automata which can be deterministic or non-deterministic. 2 3 1 0 i n t f 5 6 4 o r

  4. Non-deterministic finite automata (NFA) • A non-deterministic finite automata (NFA) is a mathematical model that consists of: (a 5-tuple • a set of states Q • a set of input symbols • a transition function that maps state-symbol pairs to sets of states. • A state q0 that is distinguished as the start (initial) state • A set of states F distinguished as accepting (final) states. • An NFA accepts an input string x if and only if there is some path in the transition graph from the start state to some accepting state (after consuming x).

  5. Finite State Machines = Regular Expression Recognizers relop < |<= |<> |> |>= |= start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other * 4 return(relop, LT) = return(relop, EQ) 5 > = 6 7 return(relop, GE) other * 8 return(relop, GT) id letter ( letter | digit )* letter or digit start letter other * 9 10 11 return(gettoken(),install_id())

  6. An NFA is non-deterministic in that (1) same character can label two or more transitions out of one state (2) empty string can label transitions. • For example, here is an NFA that recognizes the language ? • An NFA can easily implemented using a transition table, which can be used in the language recognizer. State a b 0 {0, 1} {0} 1 - {2} 2 - {3} a 2 3 1 0 a b b b

  7. From a Regular Expression to an NFA  start  i f start a a i f   N(r1) start r1r2 i f   N(r2) start r1r2 i N(r1) N(r2) f    start r* i N(r) f 

  8. Examples: • a • a | b • ab • a*b • (a|b)*abb. • NFA can be converted into deterministic finite state automata (DFA) to improve efficiency.

More Related