Maximum Entropy

Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information , is fundamental property which justifies use of that distribution for inference; it agrees with everything that is known, but carefully avoids assuming anything that is not known. It is a transcription into mathematics of an ancient principle of wisdom … (Jaynes, 1990) [from: A Maximum Entropy Approach to NLP by A.L.Berger, S.A.Della Pietra and V.J.Della Pietra, In Computational Linguistics, Vol. 22, Number 1, 1996]

Example • Let us try to see how an expert would translate the English word ‘in’ into Italian • In = {in, dentro, di, <0>, a} • If the translator selects always from this list than P(in) + P(dentro) + P(di) + P(<0>) +P(a) = 1 • If no preference P(.) = 1/5 • Suppose we notice that the translator chooses in or di in 30% of the cases. This changes our probabilities to: • P(dentro) + P(di) = 0.3 and • P(in) + P(di) + P(dentro) + P(<0>) + P(a) = 1

Example – cont. • This will change the distribution as follows (the most uniform p satisfying these constraints): • P(dentro) = 3/20 and P(di) = 3/20 • P(in) = P(<0>) = P(a) = 7/30 • Suppose we inspect the data further and we not another interesting fact: in half of the cases the expert chooses either in or di. So: • P(dentro) + P(di) = 0.3 • P(in) + P(di) + P(dentro) + P(<0>) + P(a) = 1 • P(in) + P(di) = 0.5 • Which is in such case the most uniform p?

Example - motivation • How can we measure the uniformity of a model? • Even if we answer the previous question, how do we determine the parameters of such model? • Maximum Entropy answers the questions: model everything that is known and assume nothing about what is unknown.

Aim • Construct a statistical model of the process that generated the training sample P(x,y); • P(y|x)… given a context x, the prob that the system will output y.

Feature that ‘in’ is translated as ‘a’

The expected value of fwith respect to theempirical distribution is exactly the statistics we are interested in. The expected value is:

The expected value of fwith respect to themodel is:

Constraint: The expected values of the model and of the training sample must be the same:

What does uniform mean? • The mathematical measure of the uniformity of a conditional distribution P(y|x) is provided by the conditional entropy H(Y|X)=P(X,Y)*ΣP(Y|X), here marked as H(P). Joint probability of x and y

The Maximum Entropy Principle expected observed

Maximizing the Entropy … Lagrange ?

The Algorithm 1 • Input: features, empirical distribution • Output: optimal parameter values

Step 2a. • For constant feature counts

The Algorithm 1 Revisited • Input: features, empirical distribution • Output: optimal parameter values

The Algorithm 2 – Feature Selection • Input: Collection F of candidate features, empirical distribution P(x,y) • Output: Set S of active features and a model P incorporating these features

Maximum Entropy

Maximum Entropy

Presentation Transcript

A brief maximum entropy tutorial

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Maximum Entropy Model (I)

Maximum Entropy

MaxImum entropy

Feature Selection & Maximum Entropy

Maximum Entropy

Maximum Entropy Model (I)

General Database Statistics Using Maximum Entropy

Maximum Entropy Model (II)

Natural Language Learning: MaxImum entropy

Maximum Entropy: Modeling, Decoding, Training

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model (II)

Maximum Entropy Discrimination

Maximum Entropy

Maximum Entropy

Presentation Transcript

A brief maximum entropy tutorial

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Maximum Entropy Model (I)

Maximum Entropy

MaxImum entropy

Feature Selection &amp; Maximum Entropy

Maximum Entropy

Maximum Entropy Model (I)

General Database Statistics Using Maximum Entropy

Maximum Entropy Model (II)

Natural Language Learning: MaxImum entropy

Maximum Entropy: Modeling, Decoding, Training

Maximum Entropy Model

Segmentation via Maximum Entropy Model

The Maximum-Entropy Stewpot

Maximum Entropy Discrimination

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Maximum Entropy, Maximum Entropy Production and their Application to Physics and Biology

Maximum Entropy Model (II)

Maximum Entropy Discrimination

Feature Selection & Maximum Entropy