Formal Multinomial and Multiple-Bernoulli Language Models

Formal Multinomial and Multiple-Bernoulli Language Models Don Metzler

Overview • Two formal estimation techniques • MAP estimates [Zargoza, Hiemstra, Tipping, SIGIR’03] • Posterior expectations • Language models considered • Multinomial • Multiple-Bernoulli (2 models)

Bayesian Framework(MAP Estimation) • Assume textual data X (document, query, etc) is generated by sampling from some distribution P(X | θ) parameterized by θ • Assume some prior over θ. • For each X, we want to find the maximum a posteriori (MAP) estimate: • θX is our (language) model for data X

Multinomial • Modeling assumptions: • Why Dirichlet? • Conjugate prior to multinomial • Easy to work with

Multinomial

How do we set α? • α= 1 => uniform prior => ML estimate • α= 2 => Laplacian smoothing • Dirichlet-like smoothing:

left – ML estimate – α = 1 center – Laplace – α = 2 right – α = μP(w | C) μ= 10 X = A B B B P(A | C) = 0.45 P(B | C) = 0.55

Multiple-Bernoulli • Assume vocabulary V = A B C D • How do we model text X = D B B D? • In multinomial, we represent X as the sequence D B B D • In multiple-Bernoulli we represent X as the vector [0 1 0 1] denoting terms B and D occur in X • Each X represented by single binary vector

Multiple-Bernoulli(Model A) • Modeling assumptions: • Each X is a single sample from a multiple-Bernoulli distribution parameterized by θ • Use conjugate prior (multiple-Beta)

Multiple-Bernoulli(Model A)

Problems with Model A • Ignores document length • This may be desirable in some applications • Ignores term frequencies • How to solve this? • Model X as a collection of samples (one per word occurrence) from an underlying multiple-Bernoulli distribution • Example:V = A B C D, X = B D D BRepresentation: { [0 1 0 0], [0 0 0 1], [0 0 0 1], [0 1 0 0] }

Multiple-Bernoulli(Model B) • Modeling assumptions: • Each X is a collection (multiset) of indicator vectors sampled from a multiple-Bernoulli distribution parameterized by θ • Use conjugate prior (multiple-Beta)

Multiple-Bernoulli(Model B)

How do we set α, β? • α= β= 1 => uniform prior => ML estimate • But we want smoothed probabilities… • One possibility:

Multiple-Bernoulli Model B left – ML estimate α = β = 1 center – smoothed (μ= 1) right – smoothed (μ= 10) X = A B B B P(A | C) = 0.45 P(B | C) = 0.55

Another approach… • Another way to formally estimate language models is via: • Expectation over posterior • Takes more uncertainty into account than MAP estimate • Because we chose to use conjugate priors the integral can be evaluated analytically

Multinomial / Multiple-BernoulliConnection • Multinomial • Multiple-Bernoulli • Dirichlet smoothing

Bayesian Framework(Ranking) • Query likelihood • estimate model θD for each document D • score document D by P(Q | θD) • measures likelihood of observing query Q given model θD • KL-divergence • estimate model for both query and document • score document D by KL(θQ || θD) • measures “distance” between two models • Predictive density

Results

Conclusions • Both estimation and smoothing can achieved using Bayesian estimation techniques • Little difference between MAP and posterior expectation estimates – mostly depends on μ • Not much difference between Multinomial and Multiple-Bernoulli language models • Scoring multinomial is cheaper • No good reason to choose multiple-Bernoulli over multinomial in general

Formal Multinomial and Multiple-Bernoulli Language Models

Formal Multinomial and Multiple-Bernoulli Language Models

Presentation Transcript

Formal Language and Automata Theory

multinomial

Ordinal and Multinomial Models

Bernoulli

Ordinal and Multinomial models

Formal and informal language

Formal Language

Formal and Informal Language

Automatic Labeling of Multinomial Topic Models

Formal Models of Language

Language Documentation and formal representations of language

Formal Language Theory

Formal Methods and Models

Introduction: Formal Language

Automatic Labeling of Multinomial Topic Models

Validating Formal Models

Formal Language and Automata Theory

Formal Language and Automata Theory

Formal Language and Automata Theory

Formal Language and Automata Theory

Formal Syntax and Language Change

Formal Language and Automata Theory