230 likes | 354 Views
Course Overview. PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion
E N D
Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion • Interpretation 9 Review Supplementary material: Theoretical foundations (Regular expressions)
Regular Expressions • finite state machine is a good “visual” aid • but it is not very suitable as a specification (its textual description is too clumsy) • regular expressions are a suitable specification • a more compact way to define a language that can be accepted by an FSM • used to give the lexical description of a programming language • define each “token” (keywords, identifiers, literals, operators, punctuation, etc) • define white-space, comments, etc • these are not tokens, but must be recognized and ignored
| means "or" . means "followed by“ (dot may be omitted) * means zero or more instances of ( ) are used for grouping Example: Pascal identifier • Lexical specification (in English): • a letter, followed by zero or more letters or digits • Lexical specification (as a regular expression): • letter . (letter | digit)*
Operands of a regular expression • Operands are same as labels on the edges of an FSM • single characters, or • the special character (the empty string) • "letter" is a shorthand for • a | b | c | ... | z | A | B | C | ... | Z • "digit“ is a shorthand for • 0 | 1 | 2 | … | 9 • sometimes we put the characters in quotes • necessary when denoting | . * ( )
Precedence of | . * operators. • Consider regular expressions: • letter.letter | digit* • letter.(letter | digit)*
TEST YOURSELF Question 1: Describe (in English) the language defined by each of the following regular expressions: • letter (letter* | digit*) • (letter | _ ) (letter | digit | _ )* • digit* "." digit* • digit digit* "." digit digit*
TEST YOURSELF Question 2: Write a regular expression for each of these languages: • The set of all C++ reserved words • Examples: if, while, for, class, int, case, char, true, false • C++ string literals that begin with ” and end with ” and don’t contain any other ” except possibly in the escape sequence \” • Example: ”The escape sequence \” occurs in this string” • C++ comments that begin with /* and end with */ and don’t contain any other */ within the string • Example: /* This is a comment * still the same comment */
Example: Integer Literals • An integer literal with an optional sign can be defined in English as: • “(nothing or + or -) followed by one or more digits” • The corresponding regular expression is: • (+|-|) (digit.digit*) • A new convenient operator ‘+’ • same precedence as ‘*’ • digit digit* is the same as • digit + which means "one or more digits"
Regular Exp. Corresponding Set of Strings {""} a {"a"} a.b.c {"abc"} a | b | c {"a", "b", "c"} (a | b | c)* {"", "a", "b", "c", "aa", "ab", ..., "bccabb" ...} Language Defined by a Regular Expression • Recall: language = set of strings • Language defined by an automaton • the set of strings accepted by the automaton • Language defined by a regular expression • the set of strings that match the expression
Concept of Reg Exp Generating a String Rewrite regular expression until have only a sequence of letters (string) left Replacement Rules 1) r1 | r2 ––>r1 2) r1 | r2 ––>r2 3) r* ––> r r* 4) r* ––> • Example • (0|1)* 2 (0|1)* • (0|1) (0|1)* 2 (0|1)* • 1 (0|1)* 2 (0|1)* • 1 2 (0|1)* • 1 2 (0|1) (0|1)* • 1 2 (0|1) • 1 2 0
Non–determinism in Generation • Different rule applications may yield different final results • Example 1 • (0|1)* 2 (0|1)* • (0|1) (0|1)* 2 (0|1)* • 1 (0|1)* 2 (0|1)* • 1 2 (0|1)* • 1 2 (0|1) (0|1)* • 1 2 (0|1) • 1 2 0 • Example 2 • (0|1)* 2 (0|1)* • (0|1) (0|1)* 2 (0|1)* • 0 (0|1)* 2 (0|1)* • 0 2 (0|1)* • 0 2 (0|1) (0|1)* • 0 2 (0|1) • 0 2 1
Concept of Language Generated by Reg Exp • Set of all strings generated by a regular expression is the language of the regular expression • In general, language may be infinite • String generated by regular expression language is often called a “token”
Examples of Languages and Reg Exp • = { 0, 1, . } • (0 | 1)+ "." (0 | 1)* | (0 | 1)* "." (0 | 1)+ binary floating point numbers • (0 0)* even-length all-zero strings • 1* (0 1* 0 1*)* binary strings with even number of zeros • = { a,b,c, 0, 1, 2 } • (a|b|c)(a|b|c|0|1|2)* alphanumeric identifiers • (0|1|2)+ trinary numbers
Reg Exp Notational Shorthand • R + one or more strings of R: R(R*) • R? optional R: (R|) • [abcd] one of listed characters: (a|b|c|d) • [a-z] one character from this range: (a|b|c|d...|z) • [^abc] anything but one of the listed chars • [^a-z] any one character not from this range
Equivalence of FSM and Regular Expressions • Theorem: • For each finite state machine M, we can construct a regular expression R such that M and R accept the same language. • [proof omitted] • Theorem: • For each regular expression R, we can construct a finite state machine M such that R and M accept the same language. • [proof outline follows]
M a Regular Expressions to NFSM (1) • For each kind of reg exp, define a NFSM • Notation: NFSM for reg exp M • For • For input a
A B A B Regular Expressions to NFSM (2) • For A . B • For A | B
Regular Expressions to NFSM (3) • For A* A
Example of RegExp -> NFSM conversion • Consider the regular expression (1|0)*1 • The NFSM is 1 C E 1 B A G H I J 0 D F
Converting NFSM to DFSM • Simulate the NFSM • Each state of DFSM – is a non-empty subset of states of the NFSM • Start state of DFSM – is the set of NFSM states reachable from the NFSM start state using only -moves • Add a transition S a > S’ to DFSM iff • S’ is the set of NFSM states reachable from any state in S after consuming only the input a, considering -moves as well
Remarks on converting NFSM to DFSM • An NFSM may be in many states at any time • How many different states ? • If there are N states, the NFSM must be in some subset of those N states • How many subsets are there? • 2N = finitely many • For example, if N = 5 then 2N = 32 subsets
NFSM -> DFSM Example 1 C E 1 B A G H I J 0 D F 0 FGHIABCD 0 1 0 ABCDHI 1 1 EJGHIABCD
TEST YOURSELF Question 3: First convert each of these regular expressions to a NFSM • (a | b | ) (a | b) • (ab | ba)* (aa | bb) Question 4: Next convert each resulting NFSM to a DFSM