1 / 11

Spelling Correction and the Noisy Channel

Spelling Correction and the Noisy Channel. Real-Word Spelling Correction. Real-word spelling errors. …leaving in about fifteen minuets to go to her house. The design an construction of the system… Can they lave him my messages? The study was conducted mainly be John Black .

astra
Download Presentation

Spelling Correction and the Noisy Channel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spelling Correction and the Noisy Channel Real-Word Spelling Correction

  2. Real-word spelling errors • …leaving in about fifteen minuetsto go to her house. • The design anconstruction of the system… • Can they lavehim my messages? • The study was conducted mainly beJohn Black. • 25-40% of spelling errors are real words Kukich 1992

  3. Solving real-world spelling errors • For each word in sentence • Generate candidate set • the word itself • all single-letter edits that are English words • words that are homophones • Choose best candidates • Noisy channel model • Task-specific classifier

  4. Noisy channel for real-word spell correction • Given a sentence w1,w2,w3,…,wn • Generate a set of candidates for each word wi • Candidate(w1) = {w1, w’1 , w’’1, w’’’1 ,…} • Candidate(w2) = {w2, w’2, w’’2, w’’’2 ,…} • Candidate(wn) = {wn, w’n, w’’n, w’’’n,…} • Choose the sequence W that maximizes P(W)

  5. Noisy channel for real-word spell correction

  6. Noisy channel for real-word spell correction

  7. Simplification: One error per sentence • Out of all possible sentences with one word replaced • w1, w’’2,w3,w4twooffthew • w1,w2,w’3,w4 two of the • w’’’1,w2,w3,w4 tooof thew • … • Choose the sequence W that maximizes P(W)

  8. Where to get the probabilities • Language model • Unigram • Bigram • Etc • Channel model • Same as for non-word spelling correction • Plus need probability for no error, P(w|w)

  9. Probability of no error • What is the channel probability for a correctly typed word? • P(“the”|“the”) • Obviously this depends on the application • .90 (1 error in 10 words) • .95 (1 error in 20 words) • .99 (1 error in 100 words) • .995 (1 error in 200 words)

  10. Peter Norvig’s “thew” example

  11. Spelling Correction and the Noisy Channel Real-Word Spelling Correction

More Related