500 likes | 673 Views
Human Language Technology. Finite State Transducers. Acknowledgement. Material in this lecture derived/copied in part from Richard Sproat CL46 Lectures Lauri Karttunen LSA lectures 2005 Shuly Wintner 2008 Malta. Three Key Concepts. Regular Relations. Finite State Transducers.
E N D
Human Language Technology Finite State Transducers
Acknowledgement • Material in this lecture derived/copied in part from • Richard Sproat CL46 Lectures • Lauri Karttunen LSA lectures 2005 • Shuly Wintner 2008 Malta HLT: finite state transducers
Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers
Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers
A Regular Set L1 ab abab ababab abababab ababababab . . HLT: finite state transducers
Two Regular Sets L1 L2 ab abab ababab abababab ababababab . . ba baba bababa babababa bababababa . . HLT: finite state transducers
A Regular Relation L1 x L2 L1 L2 ab abab ababab abababab ababababab . . ba baba bababa babababa bababababa . . or {("ab","ba"), ("abab","baba"),...} HLT: finite state transducers
Some closure properties for regular relations • Concatenation [R1 R2] • Power (Rn) • Reversal • Inversion (R-1) • Composition: R1 ○ R2 HLT: finite state transducers
Concatenation and Power Concatenation R1 = {("a","b")} R2 = {("c","d")} [R1 R2] = {("ac","bd")} Power R1+ = {("a","b"),("aa","bb"), ("aaa","bbb"), ...} HLT: finite state transducers
Composition • R1 ○ R2 denotes the composition of relations R1 and R2. • Definition If R1 contains <x,y> And R2 contains <y,z> Then R1 ○ R2 contains <x,z> • R1 and R2 and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation. • R1 ○ R2 is written [R1 .o. R2] in xfst 61 HLT: finite state transducers
Closure Properties of Regular Languages and Relations Operation Regular Languages Regular Relations Union yes yes Concatenation yes yes Iteration yes yes Intersection yes no Subtraction yes no Complementation yes no Composition n/a yes HLT: finite state transducers
Morphology as a Regular Relation lexical language surface language cat cats mice lives . . . cat cat+N+PL mouse+N+PL life+N+PL live+V+3SING . . or {("cat,cat"),("cats","cat+N+PL"),......} HLT: finite state transducers
Part-of-Speech Tagging • I know some new tricks • PRON V DET ADJ N • said the Cat in the Hat • V DET N P DET N HLT: finite state transducers
Singular-to-plural mapping: • cat hat ox child mouse sheep • cats hats oxen children mice sheep HLT: finite state transducers
Three Key Concepts Regular Relations Finite State Transducers Computational Morphology HLT: finite state transducers
FSA a • Used for • Recognition • Generation HLT: finite state transducers
Finite State Transducers • A finite state transducer (FST) is essentially an FSA finite state automaton that works on two (or more) tapes. • The most common way to think about transducers is as a kind of translating machine which works by reading from one tape and writing onto the other. HLT: finite state transducers
FST Definition • A 2 way FST is a quintuple (K,s,F,ixo,) where • i, o are input and output alphabets • K is a finite set of states • s K is an initial state • FK are final states • is a transition relation of typeK x i x o x K HLT: finite state transducers
FST a upper tape • Used for • Recognition • Generation • Translation lower tape b HLT: finite state transducers
A Very Simple Transducer a b Relation { ("a","b") } Notation a:b encodes the transition HLT: finite state transducers
A Very Simple Transducer a b also written as a:b HLT: finite state transducers
A Very Simple Transducer upper side a b lower side a:b HLT: finite state transducers
a Symbol Pairs • Symbols vs. symbol pairs • In general, no distinction is made in xfst betweena the language {“a”}a:a the identity relation {(“a”, “a”)} HLT: finite state transducers
Relation { ("a","b"), ("aa","bb"), ...} Notationa:b* N.B. with this notation a and b must be single symbols A (more interesting) Transducer a:b 1 HLT: finite state transducers
Transducer have SeveralModes of Operation • generation mode: It writes on both tapes. A string of as on one tape and a string of bs on the other tape. Both strings have the same length. • recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. • translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape. • translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape. HLT: finite state transducers
The Basic Idea • Morphology is regular • Morphology is finite state HLT: finite state transducers
Morphology is Regular • The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation, e.g.{ ("leaf+N+Pl","leaves"),("hang+V+Past","hung"),...} • Regular relations are closed under operations such as concatenation, iteration, union, and composition. • Complex regular relations can be derived from simpler relations. HLT: finite state transducers
Morphology is finite-state • A regular relation can be defined using the metalanguage of regular expressions. • [{talk} | {walk} | {work}] • [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; • A regular expression can be compiled into a finite-state transducer that implements the relation computationally. HLT: finite state transducers
Finite-state transducer +Base: final state +3rdSg:s a t +Progr:i :n :g a l k w o r +Past:e :d initial state Compilation Regular expression • [{talk} | {walk} | {work}] • [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; HLT: finite state transducers
Generation work+3rdSg --> works +Base: +3rdSg:s a:a t:t +Progr:i :n :g a:a l:l k:k w:w o:o r:r +Past:e :d HLT: finite state transducers
Analysis +Base: +3rdSg:s a:a t:t +Progr:i :n :g a:a l:l k:k w:w o:o r:r +Past:e :d talked --> talk+Past HLT: finite state transducers
XFST Demo 2 start xfst % xfst xfst[0]: • xfst[0]: regex • [{talk} | {walk} | {work}] • [% +Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; compile a regular expression xfst[1]: apply up walked walk+Past apply the result xfst[1]: apply down talk+SgGen3 talks HLT: finite state transducers
vouloir +IndP +SG + P3 Finite-state transducer veut citation form inflection codes v o u l o i r +IndP +SG +P3 v e u t inflected form Lexical transducer • Bidirectional: generation or analysis • Compact and fast • Comprehensive systems have been built for over 40 languages: • English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, … HLT: finite state transducers
Morphotactics Lexicon Regular Expression Lexicon FST Lexical Transducer (a single FST) Compiler composition Rules Regular Expressions Rule FSTs Alternations f a t +Adj +Comp t e f a t r How lexical transducers are made HLT: finite state transducers
fst 1 fst 2 fst n Sequential Model Lexical form Ordered sequence of rewrite rules (Chomsky & Halle ‘68) can be modeled by a cascade of finite-state transducers Johnson ‘72 Kaplan & Kay ‘81 Intermediate form ... Surface form HLT: finite state transducers
fst n Parallel Model Lexical form ... fst 2 fst 1 Surface form Set of parallel of two-level rules (constraints) compiled into finite-state automata interpreted as transducers Koskenniemi ‘83 HLT: finite state transducers
Koskenniemi 1983 Chomsky&Halle 1968 Lexical form Lexical form rule 1 rule 1 ... rule 2 rule 1 rule n Intermediate form Surface form intersect ... FST rule n Surface form Sequential vs. Parallel rules compose HLT: finite state transducers
Sequential vs. Parallel Rules • Sequential rules are combined by means of composition. • Advantage: FSTs are closed under composition • Disadvantage: order of operations is sensitive • Parallel rules are combined by means of intersection • In general, FSTs are not closed under intersection. • … but FSTs without ε-transitions are closed under intersection. HLT: finite state transducers
b:y a:x c:0 Crossproduct • A .x. B The relation that maps every string in A to every string in B, and vice versa • A:B Same as [A .x. B]. a b c .x. x y [a b c] : [x y] {abc}:{xy} HLT: finite state transducers
b a c b:B a:A c:C b:B c:C a:A d:D Composition • A .o. B The relation C such that if A maps x to y and B maps y to z, C maps x to z. {abc} .o. [a:A | b:B | c:C | d:D]* HLT: finite state transducers
Transducers are not closed under intersection c:a ε:b ε:b T1(Cn) = { anbm | m≥0 } T1∩T2 (Cn) = { anbn } c:b ε:a c:b T2(Cn) = { ambn | m≥0 } HLT: finite state transducers
Xerox RE Operators • $ containment • => restriction • -> replacement • Make it easier to describe complex languages and relations without extending the formal power of finite-state systems. HLT: finite state transducers
a ? ? a Containment $a [?* a ?*] HLT: finite state transducers
b a => b _ c b ? a c “Anyamust be preceded byb and followed byc.” ? c c ~[~[?* b] a ?*] & ~[?* a ~[c ?*]] Equivalent expression Restriction HLT: finite state transducers
a:b a b -> b a b:a ? a:b b “Replace ‘ab’ by ‘ba’.” ? a a [[~$[a b] [[a b] .x. [b a]]]* ~$[a b]] Equivalent expression Replacement HLT: finite state transducers
a|e|i|o|u -> %[ ... %] 0:[ a [ i o ? e ] u 0:] Replacement + Marking p o t a t o p[o]t[a]t[o] HLT: finite state transducers
L _ R A -> B Context Replacement The relation that replaces A by B between L and R leaving everything else unchanged. Conditional Replacement HLT: finite state transducers
Sequential application k a N p a n k a m p a n k a m m a n N -> m / _ p p -> m / m _ HLT: finite state transducers
N:m 2 p m N:m ? m 0 ? p 1 N N m p 1 ? m 0 ? p:m Sequential application in detail k a N p a n k a m p a n k a m m a n 0 0 0 2 0 0 0 0 0 0 1 0 0 0 HLT: finite state transducers
3 0 1 2 Composition N:m p:m k a N p a n k a m m a n N:m m 0 0 0 3 0 0 0 m ? ? p p:m N:m m N ? N N HLT: finite state transducers