Language Modeling & Smoothing Techniques: A Detailed Study

Language Modeling Again So are we smooth now? Courtesy of Chris Jordan

So what did we talk about last week? • Language models represent documents as multinomial distributions • What is a multinomial? • The Maximum Likelihood Estimate calculates document model • What is the Maximum Likelihood Estimate? • Smoothing document models

Why is smoothing so important? • Maximum Likelihood Estimate gives 0 probabilities • Why is that an issue? • What does smoothing do? • What types of smoothing are there

Challenge questions • What is common in every smoothing technique that we have covered? • What does smoothing really do? • Do it make for a more accurate document model? • Replace the need for more data?

A Study of Smoothing Methods of Language Models Applied to Ad Hoc Information Retrieval • Thoughts? • What is Additive? • What is Interpolation? • What is Backoff?

Laplace / Additive Smoothing • Just increasing raw term frequencies • Is that representative of the document model? • How hard is this to implement? • What happens if the constant added is really large?

Interpolation • Jelinek Mercer ps(t) = p(t|d) + (1-)p(t|corpus) • Dirichlet • Anyone know what this is? • Remember Gaussian? Poisson? Beta? Gamma? • Distributions for Binomials • Distribution for Multinomials

Dirichlet / Absolute Discounting • What do Absolute Discounting do? • How is it different from Laplace? Jelinek Mercer? • What is the key difference between the d in Jelinek Mercer and d in Dirichlet and Absolute Discounting • d is used to determine how much probability mass is subtracted from seen terms and added to unseen ones

Back off • What is the idea here? • Do not pad the probability of seen terms • Any idea why this isn’t work? • The seen terms have their probabilities decreased • Too much smoothing?

Pause… Review • Why do we smooth? • Does smoothing make sense? • What is Laplace? • What is Jelinek Mercer? • What is the Dirichlet smoothing? • What is Absolute Discounting? • What is Back off

Let’s beat this horse some more! • Everyone know what mean average precision is? • Let’s have a look at the results • Are these really improvements • What is an increase of .05 precision really mean? • Will that matter to the user?

And now we come full circle • What is a real performance improvement? • Cranfield paradigm evaluation • Corpus • Queries • Qrels • User trials • Satisfaction • Effectiveness • Efficiency

Cluster Based Smoothing • What will clustering give us? • Cluster the corpus • Find clusters for each document • Mixture model now involves • Document model • Cluster model • Corpus model • Some performance gains • Significant but not so special

Relevance Modeling • Blind Relevance Feedback approach • Top documents in the result set used as feedback • A language model is constructed from these top ranked documents for each query • This model is used as the relevance model for probabilistic retrieval

One the topic of Blind Relevance Feedback • How can we use Relative Entropy here? • Derive a model that minimizes the relative entropy between the documents in the top rank • Does Relevance Modeling make sense? • Does using Relative Entropy make sense?

The big assumption • Top ranked documents are a good source of relevant text • This obviously is not always true • There is a lot of noise • Are top rank representative of the relevant set? • Relevance modeling and Relative Entropy BRF approaches have been shown to improve performance • But not really…

Review • What is average precision? • What is the Cranfield paradigm? • What alternative sources can be used for smoothing? • Do Blind Relevance Feedback make sense? • Why does it work?

You have been a good class • We have covered • Language Modeling for ad-hoc document retrieval • Unigram model • Maximum Likelihood Estimate • Smoothing Techniques • Different mixture models • Blind Relevance Feedback for Language Modeling

Questions for me?

Questions for you • Why do we work with the unigram model? • Why is smoothing important? • How does a language model represent a document? • What is interpolation?

Let’s talk more about me

Another application of language modeling • Unsupervised Morphological Analysis • A morpheme is a basic unit of meaning in a language pretested : pre - test - ed • English is a relatively easy language • Turkish, Finnish, German are agglomerative • Very hard

Morfessor • All terms in the vocabulary are candidate morphemes • Terms are recursively split • Build up the candidate morpheme set • Repeatedly analyze the whole vocabulary until the candidate morpheme set can no longer be improved

Swordfish • Ngram based unsupervised morpheme analyzer • Character Ngrams • Substrings • A language model is constructed over all ngrams of all lengths • Maximum Likelihood Estimate • Terms recursive split based on the likelihood of the ngrams

Swordfish Results • Reasonable Results • Character ngrams are useful in finding morphemes • All morphemes are ngrams but not all ngrams are morphemes • The most prominent ngrams appear to be morphemes • How one defines prominent is an open question • Check out the PASCAL Morpho-Challenge

Language Modeling & Smoothing Techniques: A Detailed Study

Language Modeling & Smoothing Techniques: A Detailed Study

Presentation Transcript

Unified Modeling Language

Unified Modeling Language

Language Modeling

Language Modeling

6.0 Language Modeling

Unified Modeling Language

Language Modeling

Language Modeling

Rebecca Modeling Language

Unified Modeling Language

Language Modeling: Ngrams

Language Modeling

Language modeling: Smoothing

Language Modeling

C top language once again

Language modeling

Language Modeling

Unified Modeling Language

Unified Modeling Language

Language modeling