1 / 64

Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc.

Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Kenneth.Church@jhu.edu. Applications. Recognition: Shannon’s Noisy Channel Model Speech, Optical Character Recognition (OCR), Spelling Transduction Part of Speech (POS) Tagging

ada
Download Presentation

Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications (2 of 2):Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Kenneth.Church@jhu.edu

  2. Applications • Recognition: Shannon’s Noisy Channel Model • Speech, Optical Character Recognition (OCR), Spelling • Transduction • Part of Speech (POS) Tagging • Machine Translation (MT) • Parsing: ??? • Ranking • Information Retrieval (IR) • Lexicography • Discrimination: • Sentiment, Text Classification, Author Identification, Word Sense Disambiguation (WSD) • Segmentation • Asian Morphology (Word Breaking), Text Tiling • Alignment: Bilingual Corpora, Dotplots • Compression • Language Modeling: good for everything

  3. Speech  LanguageShannon’s: Noisy Channel Model Channel Model Language Model • I  Noisy Channel  O • I΄ ≈ ARGMAXI Pr(I|O) = ARGMAXI Pr(I) Pr(O|I) Application Independent

  4. Speech  LanguageUsing (Abusing) Shannon’s Noisy Channel Model: Part of Speech Tagging and Machine Translation • Speech • Words  Noisy Channel  Acoustics • OCR • Words  Noisy Channel  Optics • Spelling Correction • Words  Noisy Channel  Typos • Part of Speech Tagging (POS): • POS  Noisy Channel  Words • Machine Translation: “Made in America” • English  Noisy Channel  French Didn’t have the guts to use this slide at Eurospeech (Geneva)

  5. Spelling Correction

  6. Evaluation

  7. Performance

  8. The Task is Hard without Context

  9. Easier with Context • actuall, actual, actually • … in determining whether the defendant actually will die. • constuming, consuming, costuming • conviced, convicted, convinced • confusin, confusing, confusion • workern, worker, workers

  10. Easier with Context

  11. Context Model

  12. Future Improvements • Add More Factors • Trigrams • Thesaurus Relations • Morphology • Syntactic Agreement • Parts of Speech • Improve Combination Rules • Shrink (Meaty Methodology)

  13. Conclusion (Spelling Correction) • There has been a lot of interest in smoothing • Good-Turing estimation • Knesser-Ney • Is it worth the trouble? • Ans: Yes (at least for recognition applications)

  14. Aligning Words

More Related