190 likes | 211 Views
Explore the real language recovery using the Noisy Channel Model in Natural Language Processing (NLP). Learn about correcting spelling typos, misspellings, deleting spaces, pronunciation, and text processing to recover the original text efficiently.
E N D
Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model real language X noisy channel X Y yucky language Y want to recover X from Y 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model real language X correct spelling typos noisy channel X Y yucky language Y misspelling want to recover X from Y 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model real language X (lexicon space)* delete spaces noisy channel X Y yucky language Y text w/o spaces want to recover X from Y 600.465 - Intro to NLP - J. Eisner
language model acoustic model Noisy Channel Model real language X (lexicon space)* pronunciation noisy channel X Y yucky language Y speech want to recover X from Y 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model real language X tree probabilistic CFG delete everythingbut terminals noisy channel X Y yucky language Y text want to recover X from Y 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model real language X p(X) * p(Y | X) noisy channel X Y = yucky language Y p(X,Y) want to recover xX from yY choose x that maximizes p(x | y) or equivalently p(x,y) 600.465 - Intro to NLP - J. Eisner
.o. .o. Speech Recognition by FST Composition (Pereira & Riley 1996) p(word seq) trigram language model p(phone seq | word seq) CAT:k æ t p(acoustics | phone seq) ə: phone context phone context .o. 600.465 - Intro to NLP - J. Eisner
a:a/0.7 b:b/0.3 .o. a:C/0.1 b:C/0.8 a:D/0.9 b:D/0.2 = a:C/0.07 b:C/0.24 a:D/0.63 b:D/0.06 Noisy Channel Model p(X) * p(Y | X) = p(X,Y) Note p(x,y) sums to 1. Suppose y=“C”; what is best “x”? 600.465 - Intro to NLP - J. Eisner
Noisy Channel Model a:a/0.7 b:b/0.3 p(X) .o. * a:C/0.1 b:C/0.8 p(Y | X) a:D/0.9 b:D/0.2 = = p(X,Y) a:C/0.07 b:C/0.24 a:D/0.63 b:D/0.06 Suppose y=“C”; what is best “x”? 600.465 - Intro to NLP - J. Eisner
.o. * (Y=y)? C:C/1 Noisy Channel Model a:a/0.7 b:b/0.3 p(X) .o. * a:C/0.1 b:C/0.8 p(Y | X) a:D/0.9 b:D/0.2 restrict just to paths compatible with output “C” = = p(X, y) a:C/0.07 b:C/0.24 best path 600.465 - Intro to NLP - J. Eisner
Edit Distance Transducer O(k) deletion arcs b:e a:e O(k2) substitution arcs a:b b:a e:a O(k) insertionarcs a:a e:b b:b O(k) no-change arcs 600.465 - Intro to NLP - J. Eisner
Stochastic Edit Distance Transducer O(k) deletion arcs b:e a:e O(k2) substitution arcs a:b b:a e:a O(k) insertionarcs a:a e:b b:b O(k) identity arcs Likely edits = high-probability arcs 600.465 - Intro to NLP - J. Eisner
b:e a:e a:b b:a e:a a:a e:b b:b Stochastic Edit Distance Transducer Best path (by Dijkstra’s algorithm) clara l:e a:e r:e a:e c:e .o. a:c l:c c:c a:c r:c e:c e:c e:c e:c e:c e:c l:e a:e r:e a:e c:e l:a c:a a:a r:a a:a = e:a e:a e:a e:a e:a e:a l:e a:e r:e a:e c:e l:c c:c a:c r:c a:c e:c e:c e:c e:c e:c e:c l:e a:e r:e a:e c:e l:a .o. c:a a:a a:a r:a e:a e:a e:a e:a e:a e:a l:e a:e r:e a:e c:e caca 600.465 - Intro to NLP - J. Eisner
.o. .o. Speech Recognition by FST Composition (Pereira & Riley 1996) p(word seq) trigram language model p(phone seq | word seq) pronunciation model p(acoustics | phone seq) acoustic model .o. observed acoustics 600.465 - Intro to NLP - J. Eisner
Word Segmentation theprophetsaidtothecity • What does this say? • And what other words are substrings? • Could segment with parsing (how?), but slow. • Given L = a “lexicon” FSA that matches all English words. • How to apply to this problem? • What if Lexicon is weighted? • From unigrams to bigrams? • Smooth L to include unseen words? 600.465 - Intro to NLP - J. Eisner
Spelling correction • Spelling correction also needs a lexicon L • But there is distortion … • Let T be a transducer that models common typos and other spelling errors • ance () ence(deliverance, ...) • ee(deliverance, ...) • e e // Cons _ Cons(athlete, ...) • rrr (embarrasş occurrence, …) • gedge(privilege, …) • etc. • Now what can you do with L .o. T ? • Should T and L have probabilities? • Want T to include “all possible” errors … 600.465 - Intro to NLP - J. Eisner
Morpheme Segmentation • Let L be a machine that matches all Turkish words • Same problem as word segmentation • Just at a lower level: morpheme segmentation • Turkish word: uygarlaştıramadıklarımızdanmışsınızcasına= uygar+laş+tır+ma+dık+ları+mız+dan+mış+sınız+ca+sı+na(behaving) as if you are among those whom we could not cause to become civilized • Some constraints on morpheme sequence: bigram probs • Generative model – concatenate then fix up joints • stop + -ing = stopping, fly + -s = flies, vowel harmony • Use a cascade of transducers to handle all the fixups • But this is just morphology! • Can use probabilities here too (but people often don’t) 600.465 - Intro to NLP - J. Eisner
More Engineering Applications Markup Dates, names, places, noun phrases; spelling/grammar errors? Hyphenation Informative templates for information extraction (FASTUS) Word segmentation (use probabilities!) Part-of-speech tagging (use probabilities – maybe!) Translation Spelling correction / edit distance Phonology, morphology: series of little fixups? constraints? Speech Transliteration / back-transliteration Machine translation? Learning … 600.465 - Intro to NLP - J. Eisner 19