1 / 12

Language Model for Machine Translation

Language Model for Machine Translation. Jang, HaYoung. What is a Language Model?. Probability distribution over strings of text How likely is a string in a given “ language ” ? Probabilities depend on what language we ’ re modeling. p 1 = P(“a quick brown dog”).

seanlopez
Download Presentation

Language Model for Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Model for Machine Translation Jang, HaYoung

  2. What is a Language Model? • Probability distribution over strings of text • How likely is a string in a given “language”? • Probabilities depend on what language we’re modeling p1 = P(“a quick brown dog”) p2 = P(“dog quick a brown”) p3 = P(“быстрая brown dog”) p4 = P(“быстраясобака”) In a language model for English: p1 > p2 > p3 > p4 In a language model for Russian: p1 < p2 < p3 < p4

  3. Language Model from Wikipedia • A statistical language model assigns a probability to a sequence of words P(w1..n) by means of a probability distribution. • Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. Estimating the probabilty of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N-gram models. • In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence. • When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md).

  4. P( )  P( )  P( )  P( ) P( )= Unigram Language Model • Colored balls are randomly drawn from an urn (with replacement) M words = (4/9)  (2/9)  (4/9)  (3/9)

  5. M P ( )=1/2 P ( )=1/4 P ( )=1/4 P( )  P( )  P( ) P( )=  P( ) Zero-Frequency Problem • Suppose some event is not in our observation S • Model will assign zero probability to that event Sequence S !! = (1/2)  (1/4)  0  (1/4) = 0

  6. Smoothing • The solution: “smooth” the word probabilities P(w) Maximum Likelihood Estimate Smoothed probability distribution w

  7. Phonetic Tree with n-gram Model TH 1 Trigram R U 0.1 T L Tell the 0.5 1 E 0.1 U 0.02 1 1 TH 1 R U 0.1 T Bigram L the 0.5 1 E 0.1 U 1 1 TH 0.02 1 R U 0.1 T Unigram L 0.5 1 E 0.1 U 0.02

  8. n-grams • n-gram • A sequence of n symbols • n-gram Language Model • A model to predict a symbol in a sequence, given its n-1 predecessors • Why use them? • Estimate the probability of a symbol in unknown text, given the frequency of its occurrence in known text

  9. Creating n-gram LMs

  10. Problems with n-grams • More n-grams than those that can be observed • Sensitivity to the genre of the training text • Newpaper articles • Personal letters • Fixed n-gram Vocabulary • Any additions lead to re-compilation of the n-gram model

  11. Whole-Sentence Language Model • The main advantage of WSME is its ability to freely incorporate arbitrary computational features into a single statistical model. The features can be: • Traditional N-gram features (bigram, trigram) • Long distance N-grams (triggers, d-2 ngram) • Class based N-gram • Syntactic features (PCFG, link grammar, dependency info.) • Other features (sentence length, dialogue features, etc)

  12. Reference • Estimation of probabilities from sparse data for the language model component of a speech recognizer, Katz, S.   • Class-based n-gram models of natural language, Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, Jenifer C. Lai • Blocking Blog Spam with Language Model Disagreement, G. Mishne, D. Carmel, and R. Lempel. In: AIRWeb '05 - First International Workshop on Adversarial Information Retrieval on the Web, at the 14th International World Wide Web Conference (WWW2005), 2005.

More Related