1 / 29

Lexical Analysis scanning

Computer languages must be precise.Both syntax and semantics must not be ambiguous.Language designers use formal notation for syntactic and semantic definitions.In this class, we are going to discuss the formal syntactic notation.. sc4312. Lecture 2a. 2. Introduction. Formal specification of syn

mira
Download Presentation

Lexical Analysis scanning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Lexical Analysis (scanning) SC4312 – Compiler Construction Originally prepared by Dr.Songsak Channarukul Modified by Dr.Kwankamol Nongpong sc4312 Lecture 2a 1

    2. Computer languages must be precise. Both syntax and semantics must not be ambiguous. Language designers use formal notation for syntactic and semantic definitions. In this class, we are going to discuss the formal syntactic notation. sc4312 Lecture 2a 2 Introduction

    3. Formal specification of syntax requires a set of rules. A computer language needs two sets of rules to specify its syntax. Lexical rules define what combinations of characters are valid for the language. Syntactic rules define what combinations of tokens are valid. sc4312 Lecture 2a 3 Specifying Syntax

    4. Regular Expressions (RE) are used to specify lexical rules. RE generates a regular language. Context-Free Grammars (CFG) are used to specify syntactic rules. CFG generates a context-free language. sc4312 Lecture 2a 4 Formal Notations for Syntax

    5. The Arabic numerals are composed of digits (0,1,2,…,9). digit ? 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 A natural number is an arbitrary-length string of digits, beginning with a nonzero digit. nonzero_digit ? 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 natural_number ? nonzero_digit digit * sc4312 Lecture 2a 5 Numerals in Formal Notation

    6. The Kleene star (*) is used to indicate zero or more repetitions of the symbol to itself. It is named after its inventor Stephen Kleene (pronounced klay’nee). sc4312 Lecture 2a 6 The Kleene Star (*)

    7. A scanner converts a stream of characters into a stream of tokens. sc4312 Lecture 2a 7 Lexical Analysis

    8. The goal of a scanner is to simplify parser’s tasks. They are usually much faster than parsers. Scanners discard irrelevant characters in a program (e.g., comments, white spaces). Parsers are concerned with only tokens. They don’t care how an identifier is called, but only that it is an identifier. sc4312 Lecture 2a 8 Lexical Analysis (cont.)

    9. Groups a stream of characters into tokens Removes irrelevant characters such as comments and white spaces Deals with pragmas sc4312 Lecture 2a 9 Tasks of Lexical Analysis

    10. Pragmas are significant comments. They provide directives or hints to the compiler. They do not affect the meaning of the program but the compilation process. Examples are inclusion/exclusion of codes, uses of code, code-improvements, etc. sc4312 Lecture 2a 10 Pragmas

    11. Alphabet (?) is a finite set of symbols (e.g., {0, 1}, {A, B, C, . . . , Z}, ASCII, Unicode). String is a finite sequence of symbols from an alphabet (e.g., ?, ABAC, student). Language is a set of strings over an alphabet (e.g., (the empty language), all English words, strings that start with a letter followed by any sequence of letters and digits sc4312 Lecture 2a 11 Describing Tokens

    12. We need a way to describe which patterns of characters are legible according to the language specification. Such models include finite automata and regular expression. sc4312 Lecture 2a 12 Now What?

    13. Any sets of string that can be defined using regular operations is called a regular language. sc4312 Lecture 2a 13 Regular Language

    14. Let A,B ? RL. A ? B = {x | x ? A or x ? B} A ? B = {x | x ? A and x ? B} A* = {x1x2…xk | k ? 0 and ?i xi ? A} The class of RL is closed under ?, ?, *, -, and ?. Prove? sc4312 Lecture 2a 14 Regular Operations

    15. Models for computers with an extremely limited amount of memory Singular: An automaton Plural: Automata sc4312 Lecture 2a 15 Finite Automata

    16. A Finite Automaton is (Q, ?, d, q0, F) Q is a finite set of states ? is a finite set of alphabets d : Q x ? ? Q is the transition function q0 is the starting state F is the set of final states. A language is regular (hence regular language) if some finite automaton recognizes it. sc4312 Lecture 2a 16 Finite Automata

    17. Transition Function Like any function, it can also be represented in forms of table. For instance, sc4312 Lecture 2a 17

    18. A finite automaton is nondeterministic if there exists more than one way to get to the next state given the same current state and input symbol. Nondeterminism is a generalization of determinism. Formal definition is similar to that of a DFA except the transition function. sc4312 Lecture 2a 18 Nondeterminism

    19. A Nondeterministic Finite Automaton (NFA) is (Q, ?, d, q0, F) where Q is a finite set of states ? is a finite set of alphabets d : Q x ?? ? P(Q) is the transition function q0 is the starting state F is the set of final states. NFA accepts if at least one path leads to an accepting state. sc4312 Lecture 2a 19 Nondeterministic Finite Automaton

    20. sc4312 Lecture 2a 20 Transition Functions

    21. Regular expression is a standard way to express languages of tokens. A regular expression is A | a ? ? (i.e., a language {a}) ? (i.e., an empty string) ? (i.e., an empty language) R1 ? R2 where R1,R2 ? RE R1R2 where R1,R2 ? RE R1* where R1 ? RE A language is regular if some RE describes it. sc4312 Lecture 2a 21 Regular Expressions

    22. RE ? NFA DFA ? NFA ? RE RE ? NFA ? DFA sc4312 Lecture 2a 22 Conversions

    23. Construct a finite automaton for 0*1(1 | 0)* sc4312 Lecture 2a 23 Exercise

    24. Convert the following NFA to a DFA sc4312 Lecture 2a 24 Exercise

    25. The new start state is the original start state (in this case 1) union with any state that can be reached from (1) via e-transition. The accept states are those that include the original accept state i.e., 1 in this case. sc4312 Lecture 2a 25

    26. DFA sc4312 Lecture 2a 26

    27. Pictorial representation of a Pascal scanner as a finite automaton sc4312 Lecture 2a 27

    28. Lexical errors occur in the scanning phase of a compilation process. They are relatively easy to handle by Throw away the current, invalid token Skip forward until a character is found that can legitimately begin a new token Restart the scanning algorithm Count on the error-recovery mechanism of the parser… sc4312 Lecture 2a 28 Lexical Errors

    29. Write regular expressions for the following: Java keywords Integer numbers C#’s identifiers Java’s float numbers (e.g., 1e1f, 2.f, .3f, 0f, 3.14f 6.022137e+23f) Hexadecimal numbers (e.g., 0x1E3a) sc4312 Lecture 2a 29 Practice

More Related