1 / 13

Automatic Spelling Correction Probability Models and Algorithms

Automatic Spelling Correction Probability Models and Algorithms. Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability Models Algorithms for Automatic Correction Conclusion. Motivation and Formulation. A set of words: the vocabulary 

barr
Download Presentation

Automatic Spelling Correction Probability Models and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Spelling CorrectionProbability Models and Algorithms • Motivation and Formulation • Demonstration of a Prototype Program • The Underlying Probability Models • Algorithms for Automatic Correction • Conclusion

  2. Motivation and Formulation • A set of words: the vocabulary  • Single-word correction: Given any character string S that may or may not belong to , match S with the most likely word W in . • Example:  = {is, are, am} iis  is ae  are anam

  3. Motivation and Formulation • Multiple-word correction: Given a series of character string S1S2…Sm, each of which may or may not belong to , match them with the most likely word series W1W2…Wm formed by words from . • Example:  = {I, is, are, am} ii bn  I am

  4. Motivation and Formulation • Given a word w, what do we mean by the most likely word for w in ?  Needs some probability models • How to find the most likely word for w?  Needs to develop algorithms

  5. Probability Models: Typical Typos Errors in the transition of mental states • Repeating characters: iis  is • Skipping characters: ae  are Mentally right, but the finger wrongly land in a nearby key • anam

  6. Probability Models • The Word Model: for each word w, how do we probabilistically transition from one mental state of trying to type some character in the word to another. e.g. Ideally: a  r  e but things like: a  a r  e a  e could happen.

  7. Probability Models • The keyboard model : (i.e. the acoustic model in speech recognition) for a mental state of trying to type a character c in a word what is the probability distribution over the actual keys touched. e.g. Ideally: you want to type a  you touch a but you might touch b, q , z , s , w , x , …

  8. Probability Models • The Language Model: (i.e. the sentence model) How do we put words together to form sentences? • The language model is not absolutely necessary for single-word correction, but it can further improve the accuracy and multiple-word correction by considering the context.

  9. Probability Models • The Language Model: (i.e. the sentence model) For example, a bigram language model shows how likely each individual word will appear in a sentence and how likely one word will follow another word . Such knowledge can help : e.g. you see two words: I an I an are much more likely generated from I am than from I a

  10. Algorithms Calculate the probability of generating a character string S of s characters when trying to type a word W of w characters. • O(sw2) using dynamic programming • O(ws) using a naïve approach

  11. Algorithms Single-word correction: Determine the most likely word from a vocabulary of v words (with maximally w characters per word) for a string S of s characters. • O(vsw2) using dynamic programming • For each word W in the vocabulary, calculate the probability of generating S from W, weighted by individual word frequency, find the most like one.

  12. Algorithms Multiple-word correction: Determine the most likely word series W1W2…Wm of m words from a vocabulary of v words (with maximally w characters in each word there) for m strings S1S2…Sm of (with maximally s characters in each string).

  13. Conclusion • Similar modeling and analysis applicable to speech recognition • Mathematical structures provides powerful tools for modeling and analysis • Design and analysis of algorithms important to real-world problem solving • Mathematical structures and algorithms: two key components of modern AI research.

More Related