280 likes | 413 Views
Languages and Machines. Unit two: Regular languages and Finite State Automata. Review of week one. A language is a set of strings (the set of different things you can say). May be infinite. A string is a sequence of symbols. Minimum length zero, maximum length some finite number.
E N D
Languages and Machines Unit two: Regular languages and Finite State Automata
Review of week one • A language is a set of strings (the set of different things you can say). May be infinite. • A string is a sequence of symbols. Minimum length zero, maximum length some finite number. • A symbol is just some mark on the page or screen. A language has a finite alphabet of symbols.
Review of week one • In a context-dependent language, the meaning of a phrase depends on the context • In a context-sensitive language, the structure of a phrase depends on the context • Most natural languages are context-dependent but not context-sensitive • A context-free language is one where the structure of a phrase is always the same, independent of context • A regular language is a context-free language which has simple rules for forming valid strings (e.g. "94", "getWidth()“)
Classes of formal language phrase structure context-sensitive context-free regular
Regular languages • Here are examples of strings from a regular language with alphabet {a,b}: • a • b • ab • aaaaa • ababab
Regular languages • the empty set is a regular language • the set consisting of the empty string () is a regular language • the set consisting of a one-symbol string is a regular language • a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language • a new regular language can be made by taking the disjoint union of two regular languages
Recognizing regular languages • regular languages can be recognized and interpreted by a finite-state machine • for example, here is a machine to recognize a two-bit string: 0 0 acceptor states 1 1
Regular expressions Wouldn’t it be nice if we had a compact way of specifying a regular language? • we have! • it’s a special notation called a regular expression
Examples of regular languages • the set of all two-symbol strings containing the letters a and b (a|b)2 • the set of all two-bit strings (0|1)2 • the set of all possible words (a|..|z)+ • the set of all decimal integers (0|(1|..|9)(0|..|9)*) • the set of Java identifiers JavaLetter JavaLetterOrDigit*
More examples of regular languages • all the possible three-bit strings (0|1)3 • all the single-digit decimal numbers (0|1|2|3|4|5|6|7|8|9) (0|..|9) • all the possible repetitions of the traffic-light sequence (red, amber, green, amber) (red amber green amber)*
Activity Write down the regular expression denoting the following regular languages: • The language with two strings “the cat” and “the mat” • Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4 The allowed operator are: +, -, ×, ÷ The allowed operands are: single digit decimal numbers • The language consisting of all possible binary strings • The language of HTML tags such as <HEAD>
Suggested Answers • The language with two strings “the cat” and “the mat” the (cat | mat) or (the (c|m)at) • Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4. (0|..|9) (+|-|×|÷) (0|..|9) • The language consisting of all possible binary strings (0|1)* • The language of HTML tags such as <HEAD> < (A|..|Z)+ >
A cautionary note • You have been using a metalanguage! • The regular expression strings form a language having terminal symbols ( ) + * | plus literal symbols e.g. a stands for the letter a • this can cause problems when the metalanguage and the language get confused e.g. the language consisting of strings of one to three vertical bars: | | || | |||
A cautionary note • we can fix this by some ghastly escape convention, e.g. convert the above to "|" | "||" | "|||" • now we have problems with the quote symbol! • the best idea is to choose metalanguage symbols which are rarely encountered in the language being described, and use bold-face or color to distinguish
Regular languages and regular expressions Regular Expression • a • a b • a | b Regular Language • the empty set • the set consisting of the empty string () • the set consisting of a one-symbol string (e.g. "a") • a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language • a new regular language can be made by taking the union of two regular languages
Regular languages and regular expressions The other ways of forming regular expressions are just shorthand: a0= a1= a a2= aa a* = | a | aa | aaa | ... a+= a | aa | aaa | ...
Regular languages and regular expressions • Brackets are used to show precedence of the operations (a | b )* a | b* • default precedence is: * or + or n concatenation |
Activity Give examples of the following languages: • (x | y | z)3 • x | y | z* • a b2 • (a b)2
Suggested Answers Give examples of the following languages: • (x | y | z)3 xzy • x | y | z* • a b2 abb • (a b)2abab
From Regular Expressions to Finite State Automata • It is an amazing fact that any regular expression has an equivalent finite state automaton which recognizes it • and every finite state automaton recognizes some regular expression • we will prove these propositions later
Finite State Machines transition • an FSM to add two binary numbers D 0 00 start state B 1 0 end state E A 01 0 output symbol 1 C 1 input symbol F 10
Finite state automata • These are simple machines with no output symbols • they can only recognize strings of input symbols • acceptance is shown by a special state
NFAs • The kind of finite state automata we shall be using are called nondeterministic finite automata • "nondeterministic" means we can do naughty things like: • have a transition without a symbol • label two exit transitions with the same symbol • not show the paths which lead to failure
Example of an NFA b • what regular language does this NFA represent? a b | a b c | a+ a b c a a a
Examples of conversion from REs to NFAs • (a b)2 • a b2 • (a | b)2 • (a | b)* a b b a a b b a a b b a b
Activity Convert the following regular expressions to NFAs: • JavaLetter JavaLetterOrDigit* • (red amber green amber)* Convert the following NFAs to REs: a b c a b d
Suggested answer • (ab)* • (ac|bd)+ javaLetter javaLetterOrDigit amber amber green red
Summary • regular expressions give us a neat notation for describing regular languages • nondeterministic finite automata (NFAs) provide a diagrammatic version of regular expressions • these notations are equivalent • finite automata theory is crucial in generating lexical analyzers from regular expressions