240 likes | 272 Views
This presentation covers Finite State Acceptors, Regexps, Determinization, Minimization, Morphology, Transducers, Regular Relations, Function Composition, Lexical Transducers, and Inverting Relations in NLP. Learn about efficient algorithms and practical applications of finite-state methods in natural language processing.
E N D
Finite-State Methods 600.465 - Intro to NLP - J. Eisner
c a e Finite state acceptors (FSAs) • Regexps • Union, Kleene *, concat, intersect, complement, reversal • Determinization, minimization • Pumping, Myhill-Nerode 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) Network Wordlist c l e a r clear clever ear ever fat father e v e compile f a t h Example: Unweighted acceptor FSM 17728 states, 37100 arcs 0.6 sec /usr/dict/words 25K words 206K chars 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) Wordlist clear 0 clever 1 ear 2 ever 3 fat 4 father 5 Example: Weighted acceptor Network c/0 l/0 e/0 a/0 r/0 e/2 v/1 e/0 compile f/4 a/0 t/0 h/1 Computes a perfect hash! 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) Wordlist c/0 l/0 e/0 a/0 r/0 clear 0 clever 1 ear 2 ever 3 fat 4 father 5 e/2 v/1 e/0 f/4 a/0 t/0 h/1 Example: Weighted acceptor Network c l e a r 2 6 1 2 1 2 e v e compile f a t h 2 1 2 2 • Successor states partition the path set • Use offsets of successor states as arc weights • Q: Would this work for an arbitrary numbering of the words? • Compute number of paths from each state (Q: how?) 600.465 - Intro to NLP - J. Eisner
the problem of morphology (“word shape”) - an area of linguistics Example: Unweighted transducer VP [head=vouloir,...] V[head=vouloir,tense=Present,num=SG, person=P3] ... veut 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) vouloir +Pres +Sing + P3 Finite-state transducer veut canonical form inflection codes v o u l o i r +Pres +Sing +P3 v e u t inflected form Example: Unweighted transducer VP [head=vouloir,...] V[head=vouloir,tense=Present,num=SG, person=P3] ... veut the relevant path 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen vouloir +Pres +Sing + P3 Finite-state transducer veut canonical form inflection codes v o u l o i r +Pres +Sing +P3 v e u t inflected form Example: Unweighted transducer • Bidirectional: generation or analysis • Compact and fast • Xerox sells for about 20 languges including English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, ... • Research systems for many other languages, including Arabic, Malay the relevant path 600.465 - Intro to NLP - J. Eisner
Regular Relation (of strings) • Relation: like a function, but multiple outputs ok • Regular: finite-state • Transducer: automaton w/ outputs • b ? a ? • aaaaa ? a:e b:b {b} {} a:a a:c {ac, aca, acab, acabc} ?:c b:b ?:b • Invertible? • Closed under composition? ?:a b:e 600.465 - Intro to NLP - J. Eisner
Regular Relation (of strings) • Can weight the arcs: vs. • b {b} a {} • aaaaa {ac, aca, acab, acabc} • How to find best outputs? • For aaaaa? • For all inputs at once? a:e b:b a:a a:c ?:c b:b ?:b ?:a b:e 600.465 - Intro to NLP - J. Eisner
Acceptors (FSAs) Transducers (FSTs) c c:z a a:x Unweighted e e:y c:z/.7 c/.7 a:x/.5 a/.5 Weighted .3 .3 e:y/.5 e/.5 Function from strings to ... {false, true} strings numbers (string, num) pairs 600.465 - Intro to NLP - J. Eisner
Sample functions Acceptors (FSAs) Transducers (FSTs) {false, true} strings Grammatical? Markup Correction Translation Unweighted numbers (string, num) pairs How grammatical? Better, how likely? Good markups Good corrections Good translations Weighted 600.465 - Intro to NLP - J. Eisner
abcd abcd f g Functions ab?d 600.465 - Intro to NLP - J. Eisner
Functions ab?d abcd Function composition: f g [first f, then g – intuitive notation, but opposite of the traditional math notation] 600.465 - Intro to NLP - J. Eisner
g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... From Functions to Relations f ab?d abcd 600.465 - Intro to NLP - J. Eisner
From Functions to Relations 4 ab?d abgd 2 abed Relation composition: f g 8 abd ... 600.465 - Intro to NLP - J. Eisner
From Functions to Relations ab?d abed Pick best output Often in NLP, all of the functions or relations involved can be described as finite-state machines, and manipulated using standard algorithms. 600.465 - Intro to NLP - J. Eisner
g 3 4 abgd 2 2 abed abed 8 6 abd abjd ... Inverting Relations f ab?d abcd 600.465 - Intro to NLP - J. Eisner
Inverting Relations g-1 f-1 3 4 ab?d abcd abgd 2 2 abed abed 8 6 abd abjd ... 600.465 - Intro to NLP - J. Eisner
Inverting Relations 4 ab?d abgd 2 abed (f g)-1 = g-1 f-1 8 abd ... 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) c l e a r e v e f a t h Lexical Transducer (a single FST) Lexicon FSA composition Regular Expressions for Rules ComposedRule FSTs Compiler b i g +Adj +Comp g e b i g r one path Building a lexical transducer big | clear | clever | ear | fat | ... Regular Expression Lexicon 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) c l e a r e v e f a t h Lexicon FSA Building a lexical transducer • Actually, the lexicon must contain elements likebig +Adj +Comp • So write it as a more complicated expression:(big | clear | clever | fat | ...) +Adj ( | +Comp | +Sup) adjectives | (ear | father | ...) +Noun (+Sing | +Pl) nouns | ... ... • Q: Why do we need a lexicon at all? big | clear | clever | ear | fat | ... Regular Expression Lexicon 600.465 - Intro to NLP - J. Eisner
slide courtesy of L. Karttunen (modified) être+IndP +SG + P1 suivre+IndP+SG+P1 suivre+IndP+SG+P2 suivre+Imp+SG + P2 Weighted version of transducer: Assigns a weight to each string pair “upper language” 4 payer+IndP+SG+P1 19 20 12 Weighted French Transducer 50 3 suis paie “lower language” paye 600.465 - Intro to NLP - J. Eisner