210 likes | 323 Views
LING/C SC/PSYC 438/538. Lecture 7 9/15 Sandiway Fong. Administrivia. Reminder Nex t Monday, Nirav Merchant from Bio5 will give a guest lecture a break from the usual JM material Also next Monday, the corpus reformatting homework is due. Today’s Topic. Section 2.2 of JM
E N D
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong
Administrivia • Reminder • Next Monday, Nirav Merchant from Bio5 will give a guest lecture • a break from the usual JM material • Also next Monday, the corpus reformatting homework is due
Today’s Topic • Section 2.2 of JM • Finite State Automata
FSA Regular Expressions Regular Grammars Regular Languages • Three formalisms • All formally equivalent (no difference in expressive power) • i.e. if you can encode it using a RE, you can do it using a FSA or regular grammar, and so on … Regular Languages talk about formal equivalence next time
Regular Languages • A regular language • is the set of strings • (including possibly the empty string) • (set itself could also be empty) • (set can be infinite) • generated by a RE/FSA/Regular Grammar
Regular Languages • Example: • Language: L = { a+b+ } “one or more a’s followed by one or more b’s” • L is a regular language • described by a regular expression • Note: • infinite set of strings belonging to language L • e.g. abbb, aaaab, aabb, *abab, * • Notation: • is the empty string (or string with zero length), sometimes εis used instead • * means string is not in the language
a s x a b y b Finite State Automata (FSA) • L = { a+b+ } can be also be generated by the following FSA > • > Indicates start state • Red circle indicates end (accepting) state • we accept a input string only when we’re in an end state and we’re at the end of the string
a s x a b y b Finite State Automata (FSA) • L = { a+b+ } can be also be generated by the following FSA > There is a natural correspondence between components of the FSA and L Note: L = {a+b+} L = {aa*bb*}
a s x a b y b Finite State Automata (FSA) • L = { a+b+ } can be also be generated by the following FSA > deterministic FSA (DFSA) no ambiguity about where to go at any given state i.e. for each input symbol in the alphabet at any given state, there is a unique “action” to take non-deterministic FSA (NDFSA) no restriction on ambiguity (surprisingly, no increase in power)
s x a a b y b Finite State Automata (FSA) • more formally • (Q,s,f,Σ,) • set of states (Q): {s,x,y} must be a finite set • start state (s): s • end state(s) (f): y • alphabet (Σ): {a, b} • transition function : signature: character × state → state • (a,s)=x • (a,x)=x • (b,x)=y • (b,y)=y >
s x a a b y b Finite State Automata (FSA) • In Perl transition function : • (a,s)=x • (a,x)=x • (b,x)=y • (b,y)=y We can simulate our 2D transition table using a hash whose elements are themselves hashes %transitiontable = ( s => { a => "x" }, x => { a => "x", b => "y" }, y => { b => "y" } ); Example: print "$transitiontable{s}{a}\n"; > Syntactic sugar %transitiontable = ( "s", { "a", "x", }, "x", { "a", "x" , "b", "y" }, "y", { "b", "y" }, );
Finite State Automata (FSA) • Given transition table encoded as a hash • How to build a decider (Accept/Reject) in Perl? • Complications: • How about ε-transitions? • Multiple end states? • Multiple start states? • Non-deterministic FSA?
Finite State Automata (FSA) %transitiontable = ( s => { a => "x" }, x => { a => "x", b => "y" }, y => { b => "y" } ); @input = @ARGV; $state = "s"; foreach $c (@input) { $state = $transitiontable{$state}{$c}; } if ($state eq "y") { print "Accept\n"; } else { print "Reject\n"; } • Example runs: perlfsm.prl a b a b Reject perlfsm.prl a a a bb Accept
Finite State Automata (FSA) • practical applications • can be encoded and run efficiently on a computer • widely used • encode regular expressions • compress large dictionaries • morphological analyzers • Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation) • see chapter 3 of textbook • speech recognizers • Markov models • = FSA + probabilities • and much more …
how: 3 vs. 6 keystrokes michael: 7 vs. 15 keystrokes Finite State Automata (FSA) • T9 text entry (tegic.com) • built in to your cellphone • predictive text entry for mobile messaging/data entry • reduces the number of keystrokes for inputting words on a telephone keypad (8 keys)
a b > b a b > ε ε-transitions • jump from state to another state with the empty character • ε-transition (textbook) or λ-transition • no increase in expressive power • examples a b > ε what’s the equivalentwithout the ε-transition?
s x a b a b z y a a b b Non-Deterministic Finite State Automata (NDFSA) • non-deterministic FSA (NDFSA) • no restriction on ambiguity (surprisingly, no increase in power) • Example: >
Non-Deterministic Finite State Automata (NDFSA) • Strategies for keeping • track of multiple states • Backtracking (backup) • Lookahead • Parallelism • algorithm gets • complicated fast
a s x b a a a b a b a b z y b a a a b b b NDFSA → (D)FSA [discussed at the end of section 2.2 in the textbook] • construct a new machine • each state of the new machine represents the set of possible states of the original machine when stepping through the input • Note: • new machine is equivalent to old one (but has more states) • new machine is deterministic • example > s {x,y} [Powerpoint Animation] > {z} {y,z} {y}
Ungraded Homework • Do not submit, but please do the following exercise to check your understanding of the material • Apply the set-of-states construction technique to the two machines on the ε-transition slide • Check your answer: • verify in each case that the machine produced is deterministic and accurately simulates its ε-transition counterpart