1 / 15

Chapter 6. Hidden Markov and Maximum Entropy Model

Chapter 6. Hidden Markov and Maximum Entropy Model. Daniel Jurafsky and James H. Martin 2008. Introduction. Maximum Entropy ( MaxEnt ) More widely known as multinomial logistic regression Begin from non-sequential classifier A probabilistic classifier

sinjin
Download Presentation

Chapter 6. Hidden Markov and Maximum Entropy Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6. Hidden Markov and Maximum Entropy Model Daniel Jurafsky and James H. Martin 2008

  2. Introduction • Maximum Entropy (MaxEnt) • More widely known as multinomial logistic regression • Begin from non-sequential classifier • A probabilistic classifier • Exponential or log-linear classifier • Text classification • Sentiment analysis • Positive or negative opinion • Sentence boundary

  3. Linear Regression

  4. Linear Regression • x(j): a particular instance • y(j)obs: observed label in the training set of x(j) • y(j)pred: predict value from linear regression model sum square error

  5. Logistic Regression – simplest case of binary classification • Consider whether x is in class (1, true) or not (0, false) ∈ [0,1] w‧f ∈ (-∞,∞) ∈ [0,∞) ∈ (-∞,∞)

  6. Logistic Regression – simplest case of binary classification

  7. Logistic Regression – Classification

  8. Advanced: Learning in logistic regression

  9. Maximum Entropy Modeling • Input: x (a word need to tag or a doc need to classify) • Features • Ends in –ing • Previous word is “the” • Each feature fi, weight wi • Particular class c • Z is a normalizing factor, used to make the prob. sum to 1

  10. Maximum Entropy Modeling C = {c1, c2, …, cC} Normalization fi: A feature that only takes on the values 0 and 1 is also called an indicator function In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature ifor a particular class c for a given observation x

  11. Maximum Entropy Modeling Assume C = {NN, VB}

  12. Learning Maximum Entropy Model

  13. HMM vs. MEMM MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible HMM MEMM word class

  14. Conditional Random Fields (CRFs) • CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy • Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation • However, CRFs are better able to trade off decisions at different sequence positions • MEMM were found to suffer from the label bias problem

  15. Label Bias • The problem appears when the MEMM contains states with different output degrees • Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states • In the extreme case, transition from a state with degree 1always gets probability 1, effectively ignoring the observation • CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence

More Related