Statistical Translation and Web Search Ranking

Statistical Translation and Web Search Ranking Jianfeng Gao Natural language processing, MSR July 22, 2011

Who should be here? • Interested in statistical machine translation and Web search ranking • Interested in modeling technologies • Look for topics for your master/PhD thesis • A difficult topic: very hard to beat a simple baseline • An easy topic: others cannot beat it either

Outline • Probability • Statistical Machine Translation (SMT) • SMT for Web search ranking

Probability (1/2) • Probability space: • Cannot say • Joint probability: • Probability that x and y are both true • Conditional probability: • Probability that y is true when we already know x is true • Independence: • x and y are independent

Probability (2/2) • : assumptions on which the probabilities are based • Product rule –from the def of conditional probability • Sum rule – a rewrite of the marginal probability def • Bayes rule – from the product rule

An example: Statistical Language Modeling

Statistical Language Modeling (SLM) • Model form • capture language structure via a probabilistic model • Model parameters • estimation of free parameters using training data

Model Form • How to incorporate language structure into a probabilistic model • Task: next word prediction • Fill in the blank: “The dog of our neighbor ___” • Starting point: word n-gram model • Very simple, yet surprisingly effective • Words are generated from left-to-right • Assumes no other structure than words themselves

Word N-gram Model • Word based model • Using chain rule on its history (=preceding words)

Word N-gram Model • How do we get probability estimates? • Get text and count! • Problem of using the whole history • Rare events: unreliable probability estimates • Assuming a vocabulary of 20,000 words, From Manning and Schütze 1999: 194

Word N-gram Model • Markov independence assumption • A word depends only on N-1 preceding words • N=3 → word trigram model • Reduce the number of parameters in the model • By forming equivalence classes • Word trigram model • ...

Model Parameters • Bayesian estimation paradigm • Maximum likelihood estimation (MLE) • Smoothing in N-gram language models

Bayesian Paradigm • – Posterior probability • – Likelihood • – Prior probability • – Marginal probability • Likelihood versus probability • for fixed , defines a probability over ; • for fixed , defines the likelihood of . • Never say “the likelihood of the data” • Always say “the likelihood of the parameters given the data”

Maximum Likelihood Estimation (MLE) • : model; : data • Assume a uniform prior • is independent of , and is dropped • where is the likelihood of parameter • Key difference between MLE and Bayesian Estimation • MLE assume that is fixed but unknown, • Bayesian estimation assumes that itself is a random variable with a prior distribution

MLE for Trigram LM • It is easy – let us get some real text and start to count • But, why is this the MLE solution?

Derivation of MLE for N-gram • Homework – an interview question of MSR  • Hints • This is a constrained optimization problem • Use log likelihood as objective function • Assume a multinomial distribution of LM • Introduce Lagrange multiplier for the constraints

Sparse Data Problem • Say our vocabulary size is |V| • There are |V|3 parameters in the trigram LM • |V| = 20,000  20,0003 = 8  1012 parameters • Most trigrams have a zero count even in a large text corpus • oops…

Smoothing: Adding One • Add one smoothing (from Bayesian paradigm) • But works very badly – do not use this • Add delta smoothing • Still very bad – do not use this

Smoothing: Backoff • Backoff trigram to bigram, bigram to unigram • D(0,1)is a discount constant – absolute discount • αis calculated so probabilities sum to 1 (homework) • Simple and effective – use this one!

Outline • Probability • SMTand translation models • SMT for web search ranking

SMT C:救援人员在倒塌的房屋里寻找生还者 E: Rescue workers search for survivors in collapsed houses and

Translation process (generative story) • C is broken into translation units • Each unit is translated into English • Glue translated units to form E • Translation models • Word-based models • Phrase-based models • Syntax-based models

Generative Modeling Art Science Engineering Story Math Code

Generative Modeling for • Story making • how a target sentence is generated from a source sentence step by step • Mathematical formulation • modeling each generation steps in the generative story using a probability distribution • Parameter estimation • implementing an effective way of estimating the probability distributions from training data

Word-Based Models: IBM Model 1 • We first choose the length for the target sentence , according to the distribution . • Then, for each position in the target sentence, we choose a position in the source sentence from which to generate the -th target word according to the distribution • Finally, we generate the target word by translating according to the distribution .

Mathematical Formulation • Assume that the choice of the length is independent of and • Assume that all positions in the source sentence are equally likely to be chosen • Assuming that each target word is generated independently from

Parameter Estimation • Model Form • MLE on word-aligned training data • Don’t forget smoothing

Phrase-Based Models

Mathematical Formulation • Assume a uniform probability over segmentations • Use the maximum approximation to the sum • Assume each phrase being translated independently and use distance-based reordering model

Parameter Estimation MLE: Don’t forget smoothing

Syntax-Based Models

Story • Parse an input Chinese sentence into a parse tree • Translate each Chinese constituent into English • VP  (PP 寻找 NP, search for NP PP) • Glue these English constituents into a well-formed English sentence.

Other Two Tasks? • Mathematical formation • Based on synchronous context free grammar (SCFG) • Parameter estimation • Learning SCFG from data • Homework  • Let us go thru an example (thanks to Michel Galley) • Hierarchical phrase model • Linguistically syntax-based models

救援人员在倒塌的房屋里寻找生还者 rescue workers search for survivors in collapsed houses collapsed houses 倒塌的房屋

救援人员在倒塌的房屋里寻找生还者 rescue workers search for survivors in collapsed houses search for survivors in collapsed houses 在倒塌的房屋里寻找生还者

A synchronous rule • Phrase-based translation unit • Discontinuous translation unit • Controlon reordering 在里寻找

A synchronous grammar 在里寻找倒塌的房屋生还者 Context-free derivation: search for in 在里寻找 search for in collapsed houses 在倒塌的房屋里寻找 search for survivors in collapsed houses 在倒塌的房屋里寻找生还者

A synchronous grammar 在里寻找倒塌的房屋生还者 Recognizes: search for survivors in collapsed houses search for collapsed houses in survivors search for survivors collapsed houses in

rescue staff in collapse of house in search survivors 救援人员在倒塌的房屋里寻找生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

rescue staff in collapse of house in search survivors 救援人员在倒塌的房屋里寻找生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP VP PP VBP IN VP PP 寻找 NP search for NP PP

rescue staff in collapse of house in search survivors 救援人员在倒塌的房屋里寻找生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS PP NP NP PP VBP PP PP VBP VP VP SCFG rule: VP-234 PP-32 寻找 NP-57 search for NP-57 PP-32

rescue staff in collapse of house in search survivors 救援人员在倒塌的房屋里寻找生还者 Rescue workers search for survivors in collapsed houses. IN NNS JJ NNS NNS NN NNS PP NP NP PP VBP VBP PP PP VP VP NP S

Outline • Probability • SMTand translation models • SMT for web search ranking

Web Documents and Search Queries • cold home remedy • cold remeedy • flu treatment • how to deal with stuffy nose?

Map Queries to Documents • Fuzzy keyword matching • Q: cold home remedy • D: best home remedies for cold and flu • Spelling correction • Q: coldremeedies • D: best homeremedies for cold and flu • Query alteration • Q: flutreatment • D: best homeremedies for cold and flu • Query/document rewriting • Q: how to deal withstuffy nose • D: best homeremedies for cold and flu • Where are we now?

Research Agenda (Gao et al. 2010, 2011) • Model documents and queries as different languages (Gao et al., 2010) • Cast mapping queries to documents as bridging the language gap via translation • Leverage statistical machine translation (SMT) technologies and infrastructures to improve search relevance

Statistical Translation and Web Search Ranking

Statistical Translation and Web Search Ranking

Presentation Transcript

Statistical Machine Translation

Statistical Machine Translation

Multi-Task Learning and Web Search Ranking

Statistical Machine Translation

Statistical Machine Translation

Personalized Ranking Model Adaptation for Web Search

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation

Collaborative Ranking Function Training for Web Search Personalization

Statistical Alignment and Machine Translation

Web Search – Summer Term 2006 VI. Web Search - Ranking

Statistical Machine Translation

Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.)

Statistical Machine Translation

Statistical Machine Translation

Search Engine Ranking Factors

Google Search Ranking

Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation Models for Personalized Search

Ranking Search Results