Understanding Language Models for Machine Translation

Language Model for Machine Translation Jang, HaYoung

What is a Language Model? • Probability distribution over strings of text • How likely is a string in a given “language”? • Probabilities depend on what language we’re modeling p1 = P(“a quick brown dog”) p2 = P(“dog quick a brown”) p3 = P(“быстрая brown dog”) p4 = P(“быстраясобака”) In a language model for English: p1 > p2 > p3 > p4 In a language model for Russian: p1 < p2 < p3 < p4

Language Model from Wikipedia • A statistical language model assigns a probability to a sequence of words P(w1..n) by means of a probability distribution. • Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. Estimating the probabilty of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N-gram models. • In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence. • When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md).

P( )  P( )  P( )  P( ) P( )= Unigram Language Model • Colored balls are randomly drawn from an urn (with replacement) M words = (4/9)  (2/9)  (4/9)  (3/9)

M P ( )=1/2 P ( )=1/4 P ( )=1/4 P( )  P( )  P( ) P( )=  P( ) Zero-Frequency Problem • Suppose some event is not in our observation S • Model will assign zero probability to that event Sequence S !! = (1/2)  (1/4)  0  (1/4) = 0

Smoothing • The solution: “smooth” the word probabilities P(w) Maximum Likelihood Estimate Smoothed probability distribution w

Phonetic Tree with n-gram Model TH 1 Trigram R U 0.1 T L Tell the 0.5 1 E 0.1 U 0.02 1 1 TH 1 R U 0.1 T Bigram L the 0.5 1 E 0.1 U 1 1 TH 0.02 1 R U 0.1 T Unigram L 0.5 1 E 0.1 U 0.02

n-grams • n-gram • A sequence of n symbols • n-gram Language Model • A model to predict a symbol in a sequence, given its n-1 predecessors • Why use them? • Estimate the probability of a symbol in unknown text, given the frequency of its occurrence in known text

Creating n-gram LMs

Problems with n-grams • More n-grams than those that can be observed • Sensitivity to the genre of the training text • Newpaper articles • Personal letters • Fixed n-gram Vocabulary • Any additions lead to re-compilation of the n-gram model

Whole-Sentence Language Model • The main advantage of WSME is its ability to freely incorporate arbitrary computational features into a single statistical model. The features can be: • Traditional N-gram features (bigram, trigram) • Long distance N-grams (triggers, d-2 ngram) • Class based N-gram • Syntactic features (PCFG, link grammar, dependency info.) • Other features (sentence length, dialogue features, etc)

Reference • Estimation of probabilities from sparse data for the language model component of a speech recognizer, Katz, S. • Class-based n-gram models of natural language, Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, Jenifer C. Lai • Blocking Blog Spam with Language Model Disagreement, G. Mishne, D. Carmel, and R. Lempel. In: AIRWeb '05 - First International Workshop on Adversarial Information Retrieval on the Web, at the 14th International World Wide Web Conference (WWW2005), 2005.

Understanding Language Models for Machine Translation

Understanding Language Models for Machine Translation

Presentation Transcript

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Statistical Translation Language Model

Machine Translation

Large Language Models in Machine Translation

Sign Language Representation for Machine Translation

Machine Translation Challenges and Language Divergences

Machine Translation Distortion Model

A Path-based Transfer Model for Machine Translation

Machine Translation

Language Model Adaptation in Machine Translation from Speech

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation

Machine Translation