1 / 19

LING 438/538 Computational Linguistics

LING 438/538 Computational Linguistics. Sandiway Fong Lecture 11: 10/3. homework 2 will be returned tomorrow (by email). homework 3 will be out on Thursday. Administrivia. Last Tuesday. textbook Chapter 2: Regular Expressions and Finite State Automata regular expressions

davin
Download Presentation

LING 438/538 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 438/538Computational Linguistics Sandiway Fong Lecture 11: 10/3

  2. homework 2 will be returned tomorrow (by email) homework 3 will be out on Thursday Administrivia

  3. Last Tuesday • textbook • Chapter 2: Regular Expressions and Finite State Automata • regular expressions • Unix grep and • wildcard search in Microsoft Word • implementing the FSA in Prolog • Method 1: • two line program fsa/2 + • transition/3 (δ function) and final_state/1 • Method 2: • define each state, e.g. x, as a predicate, e.g. x/1, • taking the input list as an argument • non-determinism handled by Prolog’s computation rule

  4. Today’s Topic • more on FSA • expressive power • limits

  5. Determinism • deterministic FSA (DFSA) • no ambiguity about where to go at any given state • non-deterministic FSA (NDFSA) • no restriction on ambiguity (surprisingly, no increase in formal power) • textbook • D-RECOGNIZE (FIGURE 2.13) • ND-RECOGNIZE (FIGURE 2.21) fsa(S,L) :- L = [C|M], transition(S,C,T), fsa(T,M). fsa(y,[]) :- end_state(E).

  6. a s x b a a a b a b a b z y b a a a b b b NDFSA → (D)FSA [discussed at the end of section 2.2 in the textbook] • construct a new machine • each state of the new machine represents the set of possible states of the original machine when stepping through the input • Note: • new machine is equivalent to old one (but has more states) • new machine is deterministic • example s {x,y} {z} {y,z} {y}

  7. a b > b a b > ε ε-transitions • jump from state to another state with the empty character • ε-transition (textbook) or λ-transition • no increase in expressive power • examples a b > ε what’s the equivalent without the ε-transition?

  8. s x a a b y b Start State(s) • Finite State Automata (FSA) • (Q,s,f,Σ,) • set of states (Q): {s,x,y} must be a finite set • start state (s): s • end state(s) (f): y • alphabet (Σ): {a, b} • transition function : signature: character × state → state • (a,s)=x • (a,x)=x • (b,x)=y • (b,y)=y >

  9. FSA Properties • FSAs (and thus regular languages) are preserved, i.e. maintain their FSA nature, under... • concatenation • union • intersection • complementation • and other operations... • [see section 2.3 of textbook]

  10. concatenation • concatenate two FSAs, result is a FSA • trick: use ε-transitions to link the automatons • example • [figure 2.24]

  11. union • disjunction (union) of two FSAs, result is a FSA • trick: use ε-transitions to link the automatons • example • [figure 2.26]

  12. b a a b intersection • (conjunction) intersect two FSAs, result is a FSA • trick: use (modified) set-of-states construction • example a b s1 x y {s1,s2} {x,s2} {y,z} a b a b s2 z b • look familiar? • that’s because • a+b* ∩ a*b+ = a+b+

  13. complementation • (complementation) the negation or opposite FSA • with respect to Σ* • the set of all possible strings from the alphabet • i.e. accepts everything original FSA rejects • and rejects everything original FSA accepts • result is still a FSA

  14. s1 b s2 a s3 ε s1 a s2 b s3 c s4 ε s0 ε s1 c s2 c s3 b s4 ε s1 d s2 d s3 Limits of Finite State Technology • Language = set of strings • case 1 • suppose set is finite • e.g. L = {ba, abc, ccb, dd} • easy to encode as a FSA (by closure under union) • case 2 • set is infinite • ...

  15. s x a a b y b Limits of Finite State Technology • Language = set of strings • case 2 • set is infinite • e.g. L = a+b+ = { ab, aab, abb, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • we know this set is regular • however, consider L = {anbn | n ≥ 1}= { ab, aabb, aaabbb, …} • “same number of b’s as a’s…” • this set is not regular. Why?

  16. The Limits of Finite State Technology • [Formally, we can use the Pumping Lemma to prove this particular case.] • informally, • we can build FSA for… • ab • aabb • aaabbb • … a b a a b b a a a b b b = end state

  17. a a a b b b b b b The Limits of Finite State Technology • we can merge the individual FSA for… • ab • aabb • aaabbb • such direct encoding would require an infinite number of states • and we’re using Finite State Automata • quite different from the infinity obtained by looping • freely iterate (no counting)

  18. s1 a s2 b s3 s1 a s2 a s3 b s4 s1 a s2 b s3 b s4 s1 s1 s1 a a a s2 s2 s2 a a b s3 s3 s3 b a b s4 s4 s4 b b b s5 s5 s5 The Limits of Finite State Technology • example • L = a+b+ = { ab, abb, aab, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • Note: • can be divided into two independent halves • each half can be replaced by iteration

  19. s1 a s2 b s3 b s4 s1 a s1 s2 s1 a a s2 a a s3 s2 a a s3 a s4 s3 a s4 b s4 b s6 b s5 b s5 b s5 b s1 a s2 a s3 s3 a s4 a s4 b s6 b s5 b s5 s7 b b s1 a s2 b s3 s0 s0 s1 a s2 a s3 b s4 ε ε ε ε s1 a s2 b s3 b s4 s1 s1 s1 a a a s2 s2 s2 a b a s3 s3 s3 a b b s4 s4 s4 b b b s5 s5 s5 The Limits of Finite State Technology • example • L = a+b+ = { ab, abb, aab, aabb, aaab, abbb, … } • “one or more a’s followed by one or more b’s” • Note: • can be divided into two independent halves • each half can be replaced by iteration

More Related