200 likes | 295 Views
LING 438/538 Computational Linguistics. Sandiway Fong Lecture 11: 10/3. homework 2 will be returned tomorrow (by email). homework 3 will be out on Thursday. Administrivia. Last Tuesday. textbook Chapter 2: Regular Expressions and Finite State Automata regular expressions
E N D
LING 438/538Computational Linguistics Sandiway Fong Lecture 11: 10/3
homework 2 will be returned tomorrow (by email) homework 3 will be out on Thursday Administrivia
Last Tuesday • textbook • Chapter 2: Regular Expressions and Finite State Automata • regular expressions • Unix grep and • wildcard search in Microsoft Word • implementing the FSA in Prolog • Method 1: • two line program fsa/2 + • transition/3 (δ function) and final_state/1 • Method 2: • define each state, e.g. x, as a predicate, e.g. x/1, • taking the input list as an argument • non-determinism handled by Prolog’s computation rule
Today’s Topic • more on FSA • expressive power • limits
Determinism • deterministic FSA (DFSA) • no ambiguity about where to go at any given state • non-deterministic FSA (NDFSA) • no restriction on ambiguity (surprisingly, no increase in formal power) • textbook • D-RECOGNIZE (FIGURE 2.13) • ND-RECOGNIZE (FIGURE 2.21) fsa(S,L) :- L = [C|M], transition(S,C,T), fsa(T,M). fsa(y,[]) :- end_state(E).
a s x b a a a b a b a b z y b a a a b b b NDFSA → (D)FSA [discussed at the end of section 2.2 in the textbook] • construct a new machine • each state of the new machine represents the set of possible states of the original machine when stepping through the input • Note: • new machine is equivalent to old one (but has more states) • new machine is deterministic • example s {x,y} {z} {y,z} {y}
a b > b a b > ε ε-transitions • jump from state to another state with the empty character • ε-transition (textbook) or λ-transition • no increase in expressive power • examples a b > ε what’s the equivalent without the ε-transition?
s x a a b y b Start State(s) • Finite State Automata (FSA) • (Q,s,f,Σ,) • set of states (Q): {s,x,y} must be a finite set • start state (s): s • end state(s) (f): y • alphabet (Σ): {a, b} • transition function : signature: character × state → state • (a,s)=x • (a,x)=x • (b,x)=y • (b,y)=y >
FSA Properties • FSAs (and thus regular languages) are preserved, i.e. maintain their FSA nature, under... • concatenation • union • intersection • complementation • and other operations... • [see section 2.3 of textbook]
concatenation • concatenate two FSAs, result is a FSA • trick: use ε-transitions to link the automatons • example • [figure 2.24]
union • disjunction (union) of two FSAs, result is a FSA • trick: use ε-transitions to link the automatons • example • [figure 2.26]
b a a b intersection • (conjunction) intersect two FSAs, result is a FSA • trick: use (modified) set-of-states construction • example a b s1 x y {s1,s2} {x,s2} {y,z} a b a b s2 z b • look familiar? • that’s because • a+b* ∩ a*b+ = a+b+
complementation • (complementation) the negation or opposite FSA • with respect to Σ* • the set of all possible strings from the alphabet • i.e. accepts everything original FSA rejects • and rejects everything original FSA accepts • result is still a FSA
s1 b s2 a s3 ε s1 a s2 b s3 c s4 ε s0 ε s1 c s2 c s3 b s4 ε s1 d s2 d s3 Limits of Finite State Technology • Language = set of strings • case 1 • suppose set is finite • e.g. L = {ba, abc, ccb, dd} • easy to encode as a FSA (by closure under union) • case 2 • set is infinite • ...
s x a a b y b Limits of Finite State Technology • Language = set of strings • case 2 • set is infinite • e.g. L = a+b+ = { ab, aab, abb, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • we know this set is regular • however, consider L = {anbn | n ≥ 1}= { ab, aabb, aaabbb, …} • “same number of b’s as a’s…” • this set is not regular. Why?
The Limits of Finite State Technology • [Formally, we can use the Pumping Lemma to prove this particular case.] • informally, • we can build FSA for… • ab • aabb • aaabbb • … a b a a b b a a a b b b = end state
a a a b b b b b b The Limits of Finite State Technology • we can merge the individual FSA for… • ab • aabb • aaabbb • such direct encoding would require an infinite number of states • and we’re using Finite State Automata • quite different from the infinity obtained by looping • freely iterate (no counting)
s1 a s2 b s3 s1 a s2 a s3 b s4 s1 a s2 b s3 b s4 s1 s1 s1 a a a s2 s2 s2 a a b s3 s3 s3 b a b s4 s4 s4 b b b s5 s5 s5 The Limits of Finite State Technology • example • L = a+b+ = { ab, abb, aab, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • Note: • can be divided into two independent halves • each half can be replaced by iteration
s1 a s2 b s3 b s4 s1 a s1 s2 s1 a a s2 a a s3 s2 a a s3 a s4 s3 a s4 b s4 b s6 b s5 b s5 b s5 b s1 a s2 a s3 s3 a s4 a s4 b s6 b s5 b s5 s7 b b s1 a s2 b s3 s0 s0 s1 a s2 a s3 b s4 ε ε ε ε s1 a s2 b s3 b s4 s1 s1 s1 a a a s2 s2 s2 a b a s3 s3 s3 a b b s4 s4 s4 b b b s5 s5 s5 The Limits of Finite State Technology • example • L = a+b+ = { ab, abb, aab, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • Note: • can be divided into two independent halves • each half can be replaced by iteration