270 likes | 583 Views
Introduction. Hidden Markov Models Digression (why HMMs aren't Bayes Nets)What are maxent models (=logistic regression = probabilistic perceptron)Why maxent models are HMMsWhy lots of other things are HMMs (hidden variable logistic regression, maximum entropy markov models, conditional random f
E N D
1. Reduction of Maximum Entropy Models to Hidden Markov Models Joshua Goodman
Machine Learning and Applied Statistics
Microsoft Research
2. Introduction Hidden Markov Models
Digression (why HMMs arent Bayes Nets)
What are maxent models (=logistic regression = probabilistic perceptron)
Why maxent models are HMMs
Why lots of other things are HMMs
(hidden variable logistic regression, maximum entropy markov models, conditional random fields, maxent with continuous outputs, etc.)
Some quick experiments (new models work better)
Conclusion: Unifies several model types
3. HMM ReviewPinball machine example
4. HMMs are Bayes Nets (so far)
5. HMMs are not Bayes Nets This is not another X is a Bayes net talk
HMMs are a different tool
Full HMM allows non-emitting ? transitions.
Dont output anything or advance the clock
Example: we dont care about bumpers
6. Pinball machine
7. Hard to map ?s to Bayes net. Mapping epsilons to Bayes nets requires an infinite number of states, even for a finite output.
8. Removal of epsilon arcs
9. Reduction of Maxent to HMMs What maxent models are
How to make an HMM for a particular maxent model
Start simple, leave out arcs, add stuff in a piece at a time.
10. Why Maxent Models Lot of applications:
Sentence breaking, language modeling, prepositional phrase attachment, part of speech tagging, parsing, grammar checking, word sense disambiguation, named entity recognition, finding noun phrases, pronoun resolution, lots more in other fields
Very good at combining information from different sources.
Nice mathematical properties
Preserve marginals, convex space (global optimum), maximum likelihood.
11. Maximum Entropy Models Same as logistic regression
Same as perceptron (single layer neural net) trained to minimize entropy (log loss).
Consider trying to find P(y|x1x2xn)
Well use multiplicative form of equations
12. Maxent Formula with an HMM
13. Maxent Formula with an HMM
14. How to make this with an HMM
15. Learning
16. Multiple training instances 4 training instances
R,111 L,001 L,010 R,101
17. Why lots of other things are HMMs Maxent Models with Continuous outputs
Maxent Models with hidden variables
Lots of pictures that go by really quick
Cloud notation for simplicity
18. Continuous Outputs Can train HMMs for either discrete or continuous outputs. Leads immediately to continuous maxent training.
19. Maxent style Models with Hidden Variables Hidden variable has value N or P
Non-emitting transition to maxent models with features dependent on value of hidden variable
Automatically learn with the forward-backward algorithm
20. Hidden Variables depends on Maxent Model Hidden variable has value N or P
Value of hidden variable depends on maxent model
Again, automatically learn with the forward-backward algorithm
21. Maximum Entropy Markov Model (MEMM)
22. Conditional Random Fields (CRF)
23. Experimental Results:Subject Verb Agreement This is a hard problem for conventional learners
Need to first locate the subject; we assume no labeled training data
Actual task: given context, determine if word is is or are
Hidden variable maxent model is ideal
Treat subject, and whether singular or plural as hidden variable
24. Maxent Model for Subject/Verb
25. Subject Verb Results
26. Conclusions Hidden Markov Models can show connections between large number of models
Hidden var, continuous, MEMMs, CRFs, more
More natural for these problems than graphical models
Graphical models require an infinite # of states for these problems
Useful for at least one problem
Future: unify HMM with ? + graphical models?
See my semiring parsing paper and work by Pfeffer, Koller, etc.
27. Actual geometry HMM vars are 0<?<1
Maxent vars are 0< ?<?
Introduce more vars, ?
Ratio between ?,? leads to full range needed values