1 / 40

Building Finite-State Machines

Learn about building Finite-State Machines, concatenation, union, composition, and inversion operators. Use a user-friendly tool for constructing FSAs and FSTs without probabilities. Explore regular expression operators and practical examples for NLP tasks.

dohertyj
Download Presentation

Building Finite-State Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Finite-State Machines 600.465 - Intro to NLP - J. Eisner

  2. Xerox Finite-State Tool • You’ll use it for homework … • Commercial product (but we have academic license here) • One of several finite-state toolkits available • This one is easiest to use but doesn’t have probabilities • Usage: • Enter a regular expression; it builds FSA or FST • Now type in input string • FSA: It tells you whether it’s accepted • FST: It tells you all the output strings (if any) • Can also invert FST to let you map outputs to inputs • Could hook it up to other NLP tools that need finite-state processing of their input or output 600.465 - Intro to NLP - J. Eisner

  3. Common Regular Expression Operators concatenation EF * +iteration E*, E+ |union E | F &intersection E & F ~ \ - complementation, minus ~E, \x, E-F .x.crossproduct E .x. F .o.composition E .o. F .uupper (input) language E.u “domain” .l.lower (output) language E.l “range” 600.465 - Intro to NLP - J. Eisner

  4. What the Operators Mean • [blackboard discussion] • [Composition is the most interesting case: see following slides.] 600.465 - Intro to NLP - J. Eisner

  5. How to define transducers? • state set Q • initial state i • set of final states F • input alphabet S (also define S*, S+, S?) • output alphabet D • transition function d: Q x S? --> 2Q • output function s: Q x S? x Q --> D* 600.465 - Intro to NLP - J. Eisner

  6. slide courtesy of L. Karttunen (modified) How to implement? concatenation EF * +iteration E*, E+ |union E | F ~ \ - complementation, minus ~E, \x, E-F &intersection E & F .x.crossproduct E .x. F .o.composition E .o. F .uupper (input) language E.u “domain” .o.lower (output) language E.l “range” 600.465 - Intro to NLP - J. Eisner

  7. example courtesy of M. Mohri = Concatenation 600.465 - Intro to NLP - J. Eisner

  8. example courtesy of M. Mohri = Union + 600.465 - Intro to NLP - J. Eisner

  9. example courtesy of M. Mohri = Closure (this example has outputs too) * why add new start state 4? why not just make state 0 final? 600.465 - Intro to NLP - J. Eisner

  10. example courtesy of M. Mohri = Upper language (domain) .u similarly construct lower language .l also called input & output languages 600.465 - Intro to NLP - J. Eisner

  11. example courtesy of M. Mohri .r = Reversal 600.465 - Intro to NLP - J. Eisner

  12. example courtesy of M. Mohri .i = Inversion 600.465 - Intro to NLP - J. Eisner

  13. Complementation • Given a machine M, represent all strings not accepted by M • Just change final states to non-final and vice-versa • Works only if machine has been determinized and completed first (why?) 600.465 - Intro to NLP - J. Eisner

  14. example adapted from M. Mohri 2/0.5 2,0/0.8 2/0.8 2,2/1.3 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 = 600.465 - Intro to NLP - J. Eisner

  15. fat/0.5 2/0.5 2,2/1.3 2/0.8 2,0/0.8 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection = Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,00,11,12,0 600.465 - Intro to NLP - J. Eisner

  16. 2/0.8 2/0.5 fat/0.7 0,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 = 0,0 Paths 00 and 01 both accept fat So must the new machine: along path 0,00,1 600.465 - Intro to NLP - J. Eisner

  17. 2/0.8 2/0.5 pig/0.7 1,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 = 0,0 0,1 Paths 00 and 11 both accept pig So must the new machine: along path 0,11,1 600.465 - Intro to NLP - J. Eisner

  18. 2/0.8 2/0.5 2,2/1.3 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 Paths 12 and 12 both accept fat So must the new machine: along path 1,12,2 600.465 - Intro to NLP - J. Eisner

  19. 2,2/0.8 2/0.8 2/0.5 eats/0.6 2,0/1.3 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 sleeps/1.9 600.465 - Intro to NLP - J. Eisner

  20. g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... What Composition Means f ab?d abcd 600.465 - Intro to NLP - J. Eisner

  21. What Composition Means 4 ab?d abgd 2 abed Relation composition: f  g 8 abd ... 600.465 - Intro to NLP - J. Eisner

  22. does not contain any pair of the form abjd  … g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... Relation = set of pairs ab?d abcd ab?d  abed ab?d  abjd … abcd  abgd abed  abed abed  abd … f ab?d abcd 600.465 - Intro to NLP - J. Eisner

  23. fg fg = {xz: y (xy f and yzg)} where x, y, z are strings Relation = set of pairs ab?d abcd ab?d  abed ab?d  abjd … abcd  abgd abed  abed abed  abd … ab?d abgd ab?d  abed ab?d  abd … 4 ab?d abgd 2 abed 8 abd ... 600.465 - Intro to NLP - J. Eisner

  24. Intersection vs. Composition Intersection pig/0.4 pig/0.3 pig/0.7 & = 0,1 1 0 1 1,1 Composition pig:pink/0.4 Wilbur:pink/0.7 Wilbur:pig/0.3 .o. = 0,1 1 0 1 1,1 600.465 - Intro to NLP - J. Eisner

  25. Intersection vs. Composition Intersection mismatch elephant/0.4 pig/0.3 pig/0.7 & = 0,1 1 0 1 1,1 Composition mismatch elephant:gray/0.4 Wilbur:gray/0.7 Wilbur:pig/0.3 .o. = 0,1 1 0 1 1,1 600.465 - Intro to NLP - J. Eisner

  26. example courtesy of M. Mohri Composition .o. =

  27. Composition .o. = a:b .o. b:b = a:b

  28. Composition .o. = a:b .o. b:a = a:a

  29. Composition .o. = a:b .o. b:a = a:a

  30. Composition .o. = b:b .o. b:a = b:a

  31. Composition .o. = a:b .o. b:a = a:a

  32. Composition .o. = a:a .o. a:b = a:b

  33. Composition .o. = b:b .o. a:b = nothing (since intermediate symbol doesn’t match)

  34. Composition .o. = b:b .o. b:a = b:a

  35. Composition .o. = a:b .o. a:b = a:b

  36. fg fg = {xz: y (xy f and yzg)} where x, y, z are strings Relation = set of pairs ab?d abcd ab?d  abed ab?d  abjd … abcd  abgd abed  abed abed  abd … ab?d abgd ab?d  abed ab?d  abd … 4 ab?d abgd 2 abed 8 abd ... 600.465 - Intro to NLP - J. Eisner

  37. Composition with Sets • We’ve defined A .o. B where both are FSTs • Now extend definition to allow one to be a FSA • Two relations (FSTs):AB = {xz: y (xy A and yzB)} • Set and relation: AB = {xz: x A and xzB } • Relation and set:AB = {xz: xzA and zB } • Two sets (acceptors) – same as intersection:AB = {x: x A and x B } 600.465 - Intro to NLP - J. Eisner

  38. Composition and Coercion • Really just treats a set as identity relation on set {abc, pqr, …} = {abcabc, pqrpqr, …} • Two relations (FSTs):AB = {xz: y (xy A and yzB)} • Set and relation is now special case (if y then y=x): AB = {xz: xxA and xzB } • Relation and set is now special case (if y then y=z): • AB = {xz: xzA and zzB } • Two sets (acceptors) is now special case:AB = {xz: xxA and xxB } 600.465 - Intro to NLP - J. Eisner

  39. 3 Uses of Set Composition: • Feed string into Greek transducer: • {abedabed} .o. Greek = {abedabed,abedabd} • {abed} .o. Greek = {abedabed,abedabd} • [{abed} .o. Greek].l = {abed, abd} • Feed several strings in parallel: • {abcd, abed} .o. Greek = {abcdabgd,abedabed,abedabd} • [{abcd,abed} .o. Greek].l = {abgd, abed, abd} • Filter result via No = {abgd, abd,…} • {abcd,abed} .o. Greek .o. No = {abcdabgd,abedabd} 600.465 - Intro to NLP - J. Eisner

  40. a:e e:x What are the “basic” transducers? • The operations on the previous slides combine transducers into bigger ones • But where do we start? • a:e for a S • e:x for x D • Q: Do we also need a:x? How about e:e ? 600.465 - Intro to NLP - J. Eisner

More Related