1 / 20

Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

Scanning, or Lexical Analysis. Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal

Download Presentation

Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scanning, or Lexical Analysis. • Regular Grammars • Non-terminals (arbitrary names) • Terminals (characters) • Productions limited to the following: • Non-terminal ::= terminal • Non-terminal ::= terminal Non-terminal • Treat character class (e.g. digit) as terminal • Regular grammars cannot count: cannot express size limits on identifiers, literals • Cannot express proper nesting (parentheses) Department of Software & Media Technology

  2. Regular Grammars • grammar for real literals with no exponent • digit :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • REALVAL ::= digit REALVAL1 • REALVAL1 ::= digit REALVAL1 (arbitrary size) • REALVAL1 ::= . INTEGERVAL • INTEGERVAL ::= digit INTEGERVAL (arbitrary size) • INTEGERVAL ::= digit • Start symbol is ? Department of Software & Media Technology

  3. Regular Expressions • RE are defined by an alphabet (terminal symbols) and three operations: • Alternation RE1 | RE2 • Concatenation RE1 RE2 • Repetition RE* (zero or more RE’s) • Language of RE’s = regular grammars • Regular expressions are more convenient for some applications Department of Software & Media Technology

  4. Finite State Machines or Finite Automata (FSM or FA) • A language defined by a grammar is a (possibly infinite) set of strings • An automaton is a computation that determines whether a given string belongs to a specified language • A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions) • Simplest automaton: memory is single number (state) Department of Software & Media Technology

  5. Specifying an Finite State Machine (FA) • A set of labeled states, directed arcs between states labeled with character • One or more states may be terminal (accepting) • Start is a distinguished state • Automaton makes transition from state S1 to S2 • If and only if arc from S1 to S2 is labeled with next character in input • Token is legal if automaton stops on terminal state Department of Software & Media Technology

  6. FA from Grammar • One state for each non-terminal • A rule of the form • Nt1 ::= terminal, generates transition from a state to final state • A rule of the form • Nt1 ::= terminal Nt2 • Generates transition from state 1 to state 2 on an arc labeled by the terminal Department of Software & Media Technology

  7. digit digit S letter letter letter underscore digit identifier digit Graphic representation of FA Department of Software & Media Technology

  8. FA from RE • Each RE corresponds to a grammar • For all REs • A natural translation to FSM exists • Alternation often leads to non-deterministic machines Department of Software & Media Technology

  9. Deterministic Finite Automata (DFA) • For all states S • For all characters C • There is at most one arc from any state S that is labeled with C • Easier to implement • No backtracking Conventions for DFA: • Error transitions are not explicitly shown • Input symbols that result in the same transition are grouped together (this set can even be given a name) • Still not displayed: stopping conditions and actions Department of Software & Media Technology

  10. Non-Deterministic Finite Automata (NFA) • A non-deterministic FA • Has at least one state • With two arcs to two distinct states • Labeled with the same character • Example: from start state, a digit can begin an integer literal or a real literal • Implementation requires backtracking Department of Software & Media Technology

  11. letter letter [other] start in_id finish return id digit Lookahead & Backtracking in NFA Department of Software & Media Technology

  12. letter letter [other] start in_id finish return id digit Implementation of FA Department of Software & Media Technology

  13. letter letter [other] start in_id finish return id digit From RE to DFA & RE to NFA Department of Software & Media Technology

  14. NFA to DFA • There is an algorithm for converting a non-deterministic machine to a deterministic one • Result may have exponentially more states • Intuitively: need new states to express uncertainty about token: int or real • Other algorithms for minimizing number of states of FSM, for showing equivalence, etc. Department of Software & Media Technology

  15. Example DFA Department of Software & Media Technology

  16. Another view of the same DFA Department of Software & Media Technology

  17. Yet another view of the same DFA Department of Software & Media Technology

  18. State Minimization in DFA Department of Software & Media Technology

  19. TINY DFA: Department of Software & Media Technology

  20. Lex for Scanner • Lex Conventions for RE • Format of a Lex Input File Department of Software & Media Technology

More Related