260 likes | 381 Views
CSA3050: NLP Algorithms. Finite State Transducers for Morphological Parsing. Resumé. FSAs are equivalent to regular languages FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels.
E N D
CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing CSA3050: NLP Algorithms
Resumé • FSAs are equivalent to regular languages • FSTs are equivalent to regular relations (over pairs of regular languages) • FSTs are like FSAs but with complex labels. • We can use FSTs to transduce between surface and lexical levels. CSA3050: NLP Algorithms
f o x g:g o:e o:e s:s e:e f:f o:o x:x Dotted Pair Notation 1) FSA recogniser for "fox" 2) FST transducers for fox/fox; goose/geese CSA3050: NLP Algorithms
g o:e o:e s e Dotted Pair Notation (2) • By convention, x:y pairs lexical symbol x with surface symbol y • By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x". CSA3050: NLP Algorithms
FSA for Number Inflection How can we augment this to produce an analysis? CSA3050: NLP Algorithms
3 Steps • Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input. • Create a transducer Tstems mapping words to word classes. • Hook the two together. CSA3050: NLP Algorithms
Tnum example “lexical” +N +PL reg-noun-stem ^ s # reg-noun-stem “intermediate” CSA3050: NLP Algorithms
1. Tnum: Noun Number Inflection • multi-character symbols • morpheme boundary ^ • word boundary # CSA3050: NLP Algorithms
Tstems example “intermediate” # reg-noun-stem Tstems d:d o:o g:gf:f o:o x:x # “surface” CSA3050: NLP Algorithms
Tstems example “intermediate” # irreg-pl-noun-form Tstems m o:i u:ε s es h e e p # “surface” CSA3050: NLP Algorithms
2. Tstems Lexicon CSA3050: NLP Algorithms
Hooking Together • There are two ways to hook the two transducers together • Cascading: hooking the output of one transducer with the input of the other and running them in series. • Composition: composing the two transducers together mathematically to create a third, equivalent transducer. CSA3050: NLP Algorithms
# +N ^ +PL s reg-noun-stem reg-noun-stem Hooking Together: cascading lexical Tnum intermediate Tstems dogfox # s surface CSA3050: NLP Algorithms
Composition of Relations • Let R and S be binary relations. • The composition of R and S written R S is defined as: • (a,c) R S if and only if(a,b) R and (b,c) Sfor all a,b,c • Transducers can also be composed CSA3050: NLP Algorithms
Tnum o Tstem CSA3050: NLP Algorithms
English Spelling Rules • consonant doubling: beg / begging • y replacement: try/tries • k insertion: panic/panicked • e deletion: make/making • e insertion: watch/watches • Each rule can be stated in more detail ... CSA3050: NLP Algorithms
e Insertion Rule • Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s • Stated formally e [x|s|z|ch]^ __ s# CSA3050: NLP Algorithms
e insertion over 3 levels The rule corresponds to the mapping between surface and intermediate levels CSA3050: NLP Algorithms
e insertion as an FST CSA3050: NLP Algorithms
Incorporating Spelling Rules • Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". • The set of spelling rules is positioned between the surface level and the intermediate level. • Parallel execution of FSTs can be carried out: • by simulation: in this case FSTs must first be aligned. • by first constructing a a single FST corresponding to their intersection. CSA3050: NLP Algorithms
Putting it all together execution of FSTi takes place in parallel CSA3050: NLP Algorithms
Kaplan and KayThe Xerox View FSTi are aligned but separate FSTi intersected together CSA3050: NLP Algorithms
Operations over FSTs • We can perform operations over FSTs which yield other FSTs. • Inversion • Union • Composition • The inversion of T, or T-1 simply computes the inverse mapping to T. CSA3050: NLP Algorithms
Inversion c a t ^ PL c a t ^ PL lexical lexical T-1 T surface surface c a t s c a t s CSA3050: NLP Algorithms
Inversion • To invert a transducer • we switch the order of the complex symbols, i.e. every i:o becomes o:i • or we leave the transducer alone, and slightly change the parsing algorithm. • Practical consequences: • Transducer is reversible • We can use the exactly the same transducer to perform either analysis or generation. CSA3050: NLP Algorithms
Closure Properties of FSTs Relations computed by FSTs are • closed under • inversion • union • composition • not closed (in general) under • intersection. However intersection is possible provided that we restrict the class of transducers. • complementation • subtraction CSA3050: NLP Algorithms