280 likes | 641 Views
Finite State Automata. Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition and generation “Transition network” Unique start point Series of states linked by transitions
E N D
Finite State Automata • A very simple and intuitive formalism suitable for certain tasks • A bit like a flow chart, but can be used for both recognition and generation • “Transition network” • Unique start point • Series of states linked by transitions • Transitions represent input to be accounted for, or output to be generated • Legal exit-point(s) explicitly identified
a b a a ! q0 q1 q2 q3 q4 ExampleJurafsky & Martin, Figure 2.10 • Loop on q3 means that it can account for infinite length strings • “Deterministic” because in any state, its behaviour is fully predictable
2.19 a b a a ! ε q0 q1 q2 q3 q4 Non-deterministic FSAJurafsky & Martin, Figure 2.18 • At state q2 with input “a” there is a choice of transitions • We can also have “jump” arcs (or empty transitions), which also introduce non-determinism
Augmented Transition Networks • ATNs were used for parsing in the 60s and 70s • For parsing, you need to pass constraints (e.g. for agreement) as well as account for input: the Transition Networks were “augmented” by having a “register” into/from which such information could be put/taken. • It’s easy to write recognizers, but computing structure is difficult • ATNs quickly become very complex; one solution isto have a “cascade” of ATNs, where transitions can call other networks
ε S q1 NP q1 ex q2 Augmented Transition Networks push NP put “num” push VP get “num” adj det put “num” n put “num” prep pop NP
a b a a ! q0 q1 q2 q3 q4 Exercises [0,b,1] [1,a,2] [2,a,3] [3,a,3] [3,!,end] fsa([[0,b,1],[1,a,2],[2,a,3],[3,a,3],[3,!,end]]).
b a a ! ε q0 q1 q2 q3 q4 NDSFA [0,b,1] [1,a,2] [2,a,3] [3,!,end] [3,empty,2] fsa([[0,b,1],[1,a,2],[2,a,3],[3,empty,2],[3,!,end]]).
FSA and NDFSA programs First load (consult) the file, eg 219.pl | ?- help. Options are as follows run - a simple recognizer; on prompt type in string with space between each element, ending in . or ! or ? run(v) - verbose recognizer gives trace of transitions gen(X) - generate text; will interact at choice points rec(X,quiet) - to generate text deterministically. Type ; to get other grammatical sequences | ?- run. b a a a a ! Enter your string: yes
FSA and NDFSA programs | ?- run(v). Enter your string: 0-b-1 1-a-2 2-a-3 3-skip-2 2-a-3 3-skip-2 2-a-3 3-skip-2 3-!-end yes b a a a a !
FSA and NDFSA programs | ?- gen(X). • Choice at state 3. Choose state from • [!,end] • (2) [empty,2] • Select choice number: 2. Choice at state 3. Choose state from (1) [!,end] (2) [empty,2] Select choice number: 2. Choice at state 3. Choose state from (1) [!,end] (2) [empty,2] Select choice number: 1. X = [b,a,a,a,a,!] ? yes
FSA and NDFSA programs | ?- rec(X,quiet). X = [b,a,a] ? ; X = [b,a,a,a] ? ; X = [b,a,a,a,a] ? ; X = [b,a,a,a,a,a] ? yes
FSAs and regular expressions • FSAs have a close relationship with “regular expressions”, a formalism for expressing strings, mainly used for searching texts, or stipulating patterns of strings • Regular expressions are defined by combinations of literal characters and special operators
Regular expressions Character Meaning Examples [ ] alternatives /[aeiou]/, /m[ae]n/ range /[a-z]/ [^ ] not /[^pbm]/, /[^ox]s/ ? optionality /Kath?mandu/ * zero or more /baa*!/ + one or more /ba+!/ . any character /cat.[aeiou]/ ^, $ start, end of line \ not special character \.\?\^ | alternate strings /cat|dog/ ( ) substring /cit(y|ies)/ etc.
Regular expressions • A regular expression can be mapped onto an FSA • Can be a good way of handling morphology • Especially in connection with Finite State Transducers
Finite State Transducers • A “transducer” defines a relationship (a mapping) between two things • Typically used for “two-level morphology”, but can be used for other things • Like an FSA, but each state transition stipulates a pair of symbols, and thus a mapping
Finite State Transducers • Three functions: • Recognizer (verification): takes a pair of strings and verifies if the FST is able to map them onto each other • Generator (synthesis): can generate a legal pair of strings • Translator (transduction): given one string, can generate the corresponding string
Some conventions • Transitions are marked by “:” • A non-changing transition “x:x” can be shown simply as “x” • Wild-cards are shown as “@” • Empty string shown as “ε”
An exampleJ&M Fig. 3.9, p.74 f o x c a t d o g P:^ s # N:ε q4 q1 g o o s e s h e e p m o u s e S:# N:ε q0 q2 q5 q7 S:# g o:e o:e s e s h e e p m o:i u:εs:c e N:ε P:# q3 q6 lexical:intermediate
0] f:f o:o x:x [1] N:ε [4] P:^ s:s #:# [7] • 0] f:f o:o x:x [1] N:ε [4] S:# [7] • 0] c:c a:a t:t [1] N:ε [4] P:^ s:s #:# [7] • 0] s:s h:h e:e p:p [2] N:ε [5] S:# [7] • 0] g:g o:o o:o s:s e:e [2] N:ε [5] P:# [7] f o x N P s # : f o x ^ s # f o x N S : f o x # c a t N P s # : c a t ^ s # s h e e p N S : s h e e p # g o o s e N P : g e e s e # f o x c a t d o g P:^ s # N:ε q4 q1 g o o s e s h e e p m o u s e S:# N:ε q0 q2 q5 q7 S:# g o:e o:e s e s h e e p m o:i u:εs:c e N:ε P:# q3 q6
other ^: ε # other q5 z, s, x s ^: ε z, s, x ^: ε ε:e s q0 q1 q2 q3 q4 #, other z, x # Lexical:surface mappingJ&M Fig. 3.14, p.78 f o x N P s # : f o x ^ s # c a t N P s # : c a t ^ s # ε e / {x s z} ^ __ s #
[0] f:f [0] o:o [0] x:x [1] ^:ε [2] ε:e [3] s:s [4] #:# [0] [0] c:c [0] a:a [0] t:t [0] ^:ε [0] s:s [0] #:# [0] f o x ^ s # f o x e s # c a t ^ s # : c a t ^ s # other ^: ε # other q5 z, s, x s ^: ε z, s, x ^: ε ε:e s q0 q1 q2 q3 q4 #, other z, x #
FST • Can be generated automatically • Therefore, slightly different formalism
c s1 d s2 s0 f s3 g s4 FST compiler http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fsinput.html [d o g N P .x. d o g s ] | [c a t N P .x. c a t s ] | [f o x N P .x. f o x e s ] | [g o o s e N P .x. g e e s e] s0: c -> s1, d -> s2, f -> s3, g -> s4. s1: a -> s5. s2: o -> s6. s3: o -> s7. s4: <o:e> -> s8. s5: t -> s9. s6: g -> s9. s7: x -> s10. s8: <o:e> -> s11. s9: <N:s> -> s12. s10: <N:e> -> s13. s11: s -> s14. s12: <P:0> -> fs15. s13: <P:s> -> fs15. s14: e -> s16. fs15: (no arcs) s16: <N:0> -> s12.
s0: c -> s1, d -> s2, f -> s3, g -> s4. s1: a -> s5. s2: o -> s6. s3: o -> s7. s4: <o:e> -> s8. s5: t -> s9. s6: g -> s9. s7: x -> s10. s8: <o:e> -> s11. s9: <N:s> -> s12. s10: <N:e> -> s13. s11: s -> s14. s12: <P:0> -> fs15. s13: <P:s> -> fs15. s14: e -> s16. fs15: (no arcs) s16: <N:0> -> s12. fst([ [s0,[c,s1], [d,s2], [f,s3], [g,s4]], [s1,[a,s5]], [s2,[o,s6]], [s3,[o,s7]], [s4,[[o,e],s8]], [s5,[t,s9]], [s6,[g,s9]], [s7,[x,s10]], [s8,[[o,e],s11]], [s9,[['N',s],s12]], [s10,[['N',e],s13]], [s11,[s,s14]], [s12,[['P',0],fs15]], [s13,[['P',s],fs15]], [s14,[e,s16]], [fs15, noarcs], [s16,[['N',0],s12]] ]).
FST 3.9 f o x c a t d o g PL:^ s # N:ε q4 q1 g o o s e s h e e p m o u s e SG:# N:ε s0 q2 q5 q7 SG:# g o:e o:e s e s h e e p m o:i u:εs:c e N:ε PL:# q3 q6
FST 3.9 (portion) f o x c a t d o g [s0,[f,s1], [c,s3], [d,s5]], [s1,[o,s2]], [s2,[x,q1]], [s3,[a,s4]], [s4,[t,q1]], [s5,[o,s6]], [s6,[g,q1]], q1 s0 o s1 s2 f x a c t s0 s3 s4 q1 d g o s5 s6