780 likes | 979 Views
Statistical Translation and Web Search Ranking. Jianfeng Gao Natural language processing, MSR July 22, 2011. Who should be here?. Interested in statistical machine translation and Web search ranking Interested in modeling technologies Look for topics for your master/PhD thesis
E N D
Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR July 22, 2011
Who should be here? • Interested in statistical machine translation and Web search ranking • Interested in modeling technologies • Look for topics for your master/PhD thesis • A difficult topic: very hard to beat a simple baseline • An easy topic: others cannot beat it either
Outline • Probability • Statistical Machine Translation (SMT) • SMT for Web search ranking
Probability (1/2) • Probability space: • Cannot say • Joint probability: • Probability that x and y are both true • Conditional probability: • Probability that y is true when we already know x is true • Independence: • x and y are independent
Probability (2/2) • : assumptions on which the probabilities are based • Product rule –from the def of conditional probability • Sum rule – a rewrite of the marginal probability def • Bayes rule – from the product rule
Statistical Language Modeling (SLM) • Model form • capture language structure via a probabilistic model • Model parameters • estimation of free parameters using training data
Model Form • How to incorporate language structure into a probabilistic model • Task: next word prediction • Fill in the blank: “The dog of our neighbor ___” • Starting point: word n-gram model • Very simple, yet surprisingly effective • Words are generated from left-to-right • Assumes no other structure than words themselves
Word N-gram Model • Word based model • Using chain rule on its history (=preceding words)
Word N-gram Model • How do we get probability estimates? • Get text and count! • Problem of using the whole history • Rare events: unreliable probability estimates • Assuming a vocabulary of 20,000 words, From Manning and Schütze 1999: 194
Word N-gram Model • Markov independence assumption • A word depends only on N-1 preceding words • N=3 → word trigram model • Reduce the number of parameters in the model • By forming equivalence classes • Word trigram model • ...
Model Parameters • Bayesian estimation paradigm • Maximum likelihood estimation (MLE) • Smoothing in N-gram language models
Bayesian Paradigm • – Posterior probability • – Likelihood • – Prior probability • – Marginal probability • Likelihood versus probability • for fixed , defines a probability over ; • for fixed , defines the likelihood of . • Never say “the likelihood of the data” • Always say “the likelihood of the parameters given the data”
Maximum Likelihood Estimation (MLE) • : model; : data • Assume a uniform prior • is independent of , and is dropped • where is the likelihood of parameter • Key difference between MLE and Bayesian Estimation • MLE assume that is fixed but unknown, • Bayesian estimation assumes that itself is a random variable with a prior distribution
MLE for Trigram LM • It is easy – let us get some real text and start to count • But, why is this the MLE solution?
Derivation of MLE for N-gram • Homework – an interview question of MSR • Hints • This is a constrained optimization problem • Use log likelihood as objective function • Assume a multinomial distribution of LM • Introduce Lagrange multiplier for the constraints
Sparse Data Problem • Say our vocabulary size is |V| • There are |V|3 parameters in the trigram LM • |V| = 20,000 20,0003 = 8 1012 parameters • Most trigrams have a zero count even in a large text corpus • oops…
Smoothing: Adding One • Add one smoothing (from Bayesian paradigm) • But works very badly – do not use this • Add delta smoothing • Still very bad – do not use this
Smoothing: Backoff • Backoff trigram to bigram, bigram to unigram • D(0,1)is a discount constant – absolute discount • αis calculated so probabilities sum to 1 (homework) • Simple and effective – use this one!
Outline • Probability • SMTand translation models • SMT for web search ranking
SMT C:救援人员在倒塌的房屋里寻找生还者 E: Rescue workers search for survivors in collapsed houses and
Translation process (generative story) • C is broken into translation units • Each unit is translated into English • Glue translated units to form E • Translation models • Word-based models • Phrase-based models • Syntax-based models
Generative Modeling Art Science Engineering Story Math Code
Generative Modeling for • Story making • how a target sentence is generated from a source sentence step by step • Mathematical formulation • modeling each generation steps in the generative story using a probability distribution • Parameter estimation • implementing an effective way of estimating the probability distributions from training data
Word-Based Models: IBM Model 1 • We first choose the length for the target sentence , according to the distribution . • Then, for each position in the target sentence, we choose a position in the source sentence from which to generate the -th target word according to the distribution • Finally, we generate the target word by translating according to the distribution .
Mathematical Formulation • Assume that the choice of the length is independent of and • Assume that all positions in the source sentence are equally likely to be chosen • Assuming that each target word is generated independently from
Parameter Estimation • Model Form • MLE on word-aligned training data • Don’t forget smoothing
Mathematical Formulation • Assume a uniform probability over segmentations • Use the maximum approximation to the sum • Assume each phrase being translated independently and use distance-based reordering model
Parameter Estimation MLE: Don’t forget smoothing
Story • Parse an input Chinese sentence into a parse tree • Translate each Chinese constituent into English • VP (PP 寻找 NP, search for NP PP) • Glue these English constituents into a well-formed English sentence.
Other Two Tasks? • Mathematical formation • Based on synchronous context free grammar (SCFG) • Parameter estimation • Learning SCFG from data • Homework • Let us go thru an example (thanks to Michel Galley) • Hierarchical phrase model • Linguistically syntax-based models
救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses collapsed houses 倒塌的 房屋
救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者
救援 人员 在 倒塌 的 房屋 里 寻找 生还者 rescue workers search for survivors in collapsed houses search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者
A synchronous rule • Phrase-based translation unit • Discontinuous translation unit • Controlon reordering 在 里 寻找
A synchronous grammar 在 里 寻找 倒塌的 房屋 生还者 Context-free derivation: search for in 在 里 寻找 search for in collapsed houses 在倒塌 的 房屋 里 寻找 search for survivors in collapsed houses 在倒塌 的 房屋 里 寻找生还者
A synchronous grammar 在 里 寻找 倒塌的 房屋 生还者 Recognizes: search for survivors in collapsed houses search for collapsed houses in survivors search for survivors collapsed houses in
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP VP PP VBP IN VP PP 寻找 NP search for NP PP
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP SCFG rule: VP-234 PP-32 寻找 NP-57 search for NP-57 PP-32
rescue staff in collapse of house in search survivors 救援 人员 在 倒塌 的 房屋 里 寻找 生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S
Outline • Probability • SMTand translation models • SMT for web search ranking
Web Documents and Search Queries • cold home remedy • cold remeedy • flu treatment • how to deal with stuffy nose?
Map Queries to Documents • Fuzzy keyword matching • Q: cold home remedy • D: best home remedies for cold and flu • Spelling correction • Q: coldremeedies • D: best homeremedies for cold and flu • Query alteration • Q: flutreatment • D: best homeremedies for cold and flu • Query/document rewriting • Q: how to deal withstuffy nose • D: best homeremedies for cold and flu • Where are we now?
Research Agenda (Gao et al. 2010, 2011) • Model documents and queries as different languages (Gao et al., 2010) • Cast mapping queries to documents as bridging the language gap via translation • Leverage statistical machine translation (SMT) technologies and infrastructures to improve search relevance