Reduction of Maximum Entropy Models to Hidden Markov Models

1. Reduction of Maximum Entropy Models to Hidden Markov Models Joshua Goodman Machine Learning and Applied Statistics Microsoft Research

2. Introduction Hidden Markov Models Digression (why HMMs aren�t Bayes Nets) What are maxent models (=logistic regression = probabilistic perceptron) Why maxent models are HMMs Why lots of other things are HMMs (hidden variable logistic regression, maximum entropy markov models, conditional random fields, maxent with continuous outputs, etc.) Some quick experiments (new models work better) Conclusion: Unifies several model types

3. HMM ReviewPinball machine example

4. HMMs are Bayes Nets (so far)

5. HMMs are not Bayes Nets This is not another �X� is a Bayes net talk HMMs are a different tool Full HMM allows �non-emitting� ? transitions. Don�t output anything or �advance the clock� Example: we don�t care about bumpers

6. Pinball machine

7. Hard to map ?�s to Bayes net. Mapping epsilons to Bayes nets requires an infinite number of states, even for a finite output.

8. Removal of epsilon arcs

9. Reduction of Maxent to HMMs What maxent models are How to make an HMM for a particular maxent model Start simple, leave out arcs, add stuff in a piece at a time.

10. Why Maxent Models Lot of applications: Sentence breaking, language modeling, prepositional phrase attachment, part of speech tagging, parsing, grammar checking, word sense disambiguation, named entity recognition, finding noun phrases, pronoun resolution, lots more in other fields� Very good at combining information from different sources. Nice mathematical properties Preserve marginals, convex space (global optimum), maximum likelihood.

11. Maximum Entropy Models Same as logistic regression Same as perceptron (single layer neural net) trained to minimize entropy (log loss). Consider trying to find P(y|x1x2�xn) We�ll use multiplicative form of equations

12. Maxent Formula with an HMM

13. Maxent Formula with an HMM

14. How to make this with an HMM

15. Learning

16. Multiple training instances 4 training instances R,111 L,001 L,010 R,101

17. Why lots of other things are HMMs Maxent Models with Continuous outputs Maxent Models with hidden variables Lots of pictures that go by really quick Cloud notation for simplicity

18. Continuous Outputs Can train HMMs for either discrete or continuous outputs. Leads immediately to continuous maxent training.

19. Maxent style Models with Hidden Variables Hidden variable has value N or P Non-emitting transition to maxent models with features dependent on value of hidden variable Automatically learn with the forward-backward algorithm

20. Hidden Variables depends on Maxent Model Hidden variable has value N or P Value of hidden variable depends on maxent model Again, automatically learn with the forward-backward algorithm

21. Maximum Entropy Markov Model (MEMM)

22. Conditional Random Fields (CRF)

23. Experimental Results:Subject Verb Agreement This is a hard problem for conventional learners Need to first locate the subject; we assume no labeled training data Actual task: given context, determine if word is �is� or �are� Hidden variable maxent model is ideal Treat subject, and whether singular or plural as hidden variable

24. Maxent Model for Subject/Verb

25. Subject Verb Results

26. Conclusions Hidden Markov Models can show connections between large number of models Hidden var, continuous, MEMMs, CRFs, more More natural for these problems than graphical models Graphical models require an infinite # of states for these problems Useful for at least one problem Future: unify HMM with ? + graphical models? See my �semiring parsing� paper and work by Pfeffer, Koller, etc.

27. Actual geometry HMM vars are 0<?<1 Maxent vars are 0< ?<? Introduce more vars, ? Ratio between ?,? leads to full range needed values

Reduction of Maximum Entropy Models to Hidden Markov Models

Reduction of Maximum Entropy Models to Hidden Markov Models

Presentation Transcript

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models