400 likes | 421 Views
Learn about building Finite-State Machines, concatenation, union, composition, and inversion operators. Use a user-friendly tool for constructing FSAs and FSTs without probabilities. Explore regular expression operators and practical examples for NLP tasks.
E N D
Building Finite-State Machines 600.465 - Intro to NLP - J. Eisner
Xerox Finite-State Tool • You’ll use it for homework … • Commercial product (but we have academic license here) • One of several finite-state toolkits available • This one is easiest to use but doesn’t have probabilities • Usage: • Enter a regular expression; it builds FSA or FST • Now type in input string • FSA: It tells you whether it’s accepted • FST: It tells you all the output strings (if any) • Can also invert FST to let you map outputs to inputs • Could hook it up to other NLP tools that need finite-state processing of their input or output 600.465 - Intro to NLP - J. Eisner
Common Regular Expression Operators concatenation EF * +iteration E*, E+ |union E | F &intersection E & F ~ \ - complementation, minus ~E, \x, E-F .x.crossproduct E .x. F .o.composition E .o. F .uupper (input) language E.u “domain” .l.lower (output) language E.l “range” 600.465 - Intro to NLP - J. Eisner
What the Operators Mean • [blackboard discussion] • [Composition is the most interesting case: see following slides.] 600.465 - Intro to NLP - J. Eisner
How to define transducers? • state set Q • initial state i • set of final states F • input alphabet S (also define S*, S+, S?) • output alphabet D • transition function d: Q x S? --> 2Q • output function s: Q x S? x Q --> D* 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) How to implement? concatenation EF * +iteration E*, E+ |union E | F ~ \ - complementation, minus ~E, \x, E-F &intersection E & F .x.crossproduct E .x. F .o.composition E .o. F .uupper (input) language E.u “domain” .o.lower (output) language E.l “range” 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri = Concatenation 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri = Union + 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri = Closure (this example has outputs too) * why add new start state 4? why not just make state 0 final? 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri = Upper language (domain) .u similarly construct lower language .l also called input & output languages 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri .r = Reversal 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri .i = Inversion 600.465 - Intro to NLP - J. Eisner
Complementation • Given a machine M, represent all strings not accepted by M • Just change final states to non-final and vice-versa • Works only if machine has been determinized and completed first (why?) 600.465 - Intro to NLP - J. Eisner
example adapted from M. Mohri 2/0.5 2,0/0.8 2/0.8 2,2/1.3 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 = 600.465 - Intro to NLP - J. Eisner
fat/0.5 2/0.5 2,2/1.3 2/0.8 2,0/0.8 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 eats/0.6 fat/0.7 pig/0.7 0,0 0,1 1,1 sleeps/1.9 Intersection = Paths 0012 and 0110 both accept fat pig eats So must the new machine: along path 0,00,11,12,0 600.465 - Intro to NLP - J. Eisner
2/0.8 2/0.5 fat/0.7 0,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 = 0,0 Paths 00 and 01 both accept fat So must the new machine: along path 0,00,1 600.465 - Intro to NLP - J. Eisner
2/0.8 2/0.5 pig/0.7 1,1 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 = 0,0 0,1 Paths 00 and 11 both accept pig So must the new machine: along path 0,11,1 600.465 - Intro to NLP - J. Eisner
2/0.8 2/0.5 2,2/1.3 sleeps/1.9 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 Paths 12 and 12 both accept fat So must the new machine: along path 1,12,2 600.465 - Intro to NLP - J. Eisner
2,2/0.8 2/0.8 2/0.5 eats/0.6 2,0/1.3 Intersection fat/0.5 pig/0.3 eats/0 0 1 sleeps/0.6 pig/0.4 fat/0.2 sleeps/1.3 & 0 1 eats/0.6 fat/0.7 pig/0.7 = 0,0 0,1 1,1 sleeps/1.9 600.465 - Intro to NLP - J. Eisner
g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... What Composition Means f ab?d abcd 600.465 - Intro to NLP - J. Eisner
What Composition Means 4 ab?d abgd 2 abed Relation composition: f g 8 abd ... 600.465 - Intro to NLP - J. Eisner
does not contain any pair of the form abjd … g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... Relation = set of pairs ab?d abcd ab?d abed ab?d abjd … abcd abgd abed abed abed abd … f ab?d abcd 600.465 - Intro to NLP - J. Eisner
fg fg = {xz: y (xy f and yzg)} where x, y, z are strings Relation = set of pairs ab?d abcd ab?d abed ab?d abjd … abcd abgd abed abed abed abd … ab?d abgd ab?d abed ab?d abd … 4 ab?d abgd 2 abed 8 abd ... 600.465 - Intro to NLP - J. Eisner
Intersection vs. Composition Intersection pig/0.4 pig/0.3 pig/0.7 & = 0,1 1 0 1 1,1 Composition pig:pink/0.4 Wilbur:pink/0.7 Wilbur:pig/0.3 .o. = 0,1 1 0 1 1,1 600.465 - Intro to NLP - J. Eisner
Intersection vs. Composition Intersection mismatch elephant/0.4 pig/0.3 pig/0.7 & = 0,1 1 0 1 1,1 Composition mismatch elephant:gray/0.4 Wilbur:gray/0.7 Wilbur:pig/0.3 .o. = 0,1 1 0 1 1,1 600.465 - Intro to NLP - J. Eisner
example courtesy of M. Mohri Composition .o. =
Composition .o. = a:b .o. b:b = a:b
Composition .o. = a:b .o. b:a = a:a
Composition .o. = a:b .o. b:a = a:a
Composition .o. = b:b .o. b:a = b:a
Composition .o. = a:b .o. b:a = a:a
Composition .o. = a:a .o. a:b = a:b
Composition .o. = b:b .o. a:b = nothing (since intermediate symbol doesn’t match)
Composition .o. = b:b .o. b:a = b:a
Composition .o. = a:b .o. a:b = a:b
fg fg = {xz: y (xy f and yzg)} where x, y, z are strings Relation = set of pairs ab?d abcd ab?d abed ab?d abjd … abcd abgd abed abed abed abd … ab?d abgd ab?d abed ab?d abd … 4 ab?d abgd 2 abed 8 abd ... 600.465 - Intro to NLP - J. Eisner
Composition with Sets • We’ve defined A .o. B where both are FSTs • Now extend definition to allow one to be a FSA • Two relations (FSTs):AB = {xz: y (xy A and yzB)} • Set and relation: AB = {xz: x A and xzB } • Relation and set:AB = {xz: xzA and zB } • Two sets (acceptors) – same as intersection:AB = {x: x A and x B } 600.465 - Intro to NLP - J. Eisner
Composition and Coercion • Really just treats a set as identity relation on set {abc, pqr, …} = {abcabc, pqrpqr, …} • Two relations (FSTs):AB = {xz: y (xy A and yzB)} • Set and relation is now special case (if y then y=x): AB = {xz: xxA and xzB } • Relation and set is now special case (if y then y=z): • AB = {xz: xzA and zzB } • Two sets (acceptors) is now special case:AB = {xz: xxA and xxB } 600.465 - Intro to NLP - J. Eisner
3 Uses of Set Composition: • Feed string into Greek transducer: • {abedabed} .o. Greek = {abedabed,abedabd} • {abed} .o. Greek = {abedabed,abedabd} • [{abed} .o. Greek].l = {abed, abd} • Feed several strings in parallel: • {abcd, abed} .o. Greek = {abcdabgd,abedabed,abedabd} • [{abcd,abed} .o. Greek].l = {abgd, abed, abd} • Filter result via No = {abgd, abd,…} • {abcd,abed} .o. Greek .o. No = {abcdabgd,abedabd} 600.465 - Intro to NLP - J. Eisner
a:e e:x What are the “basic” transducers? • The operations on the previous slides combine transducers into bigger ones • But where do we start? • a:e for a S • e:x for x D • Q: Do we also need a:x? How about e:e ? 600.465 - Intro to NLP - J. Eisner