Chapter 6. Hidden Markov and Maximum Entropy Model

Chapter 6. Hidden Markov and Maximum Entropy Model Daniel Jurafsky and James H. Martin 2008

Introduction • Maximum Entropy (MaxEnt) • More widely known as multinomial logistic regression • Begin from non-sequential classifier • A probabilistic classifier • Exponential or log-linear classifier • Text classification • Sentiment analysis • Positive or negative opinion • Sentence boundary

Linear Regression

Linear Regression • x(j): a particular instance • y(j)obs: observed label in the training set of x(j) • y(j)pred: predict value from linear regression model sum square error

Logistic Regression – simplest case of binary classification • Consider whether x is in class (1, true) or not (0, false) ∈ [0,1] w‧f ∈ (-∞,∞) ∈ [0,∞) ∈ (-∞,∞)

Logistic Regression – simplest case of binary classification

Logistic Regression – Classification

Advanced: Learning in logistic regression

Maximum Entropy Modeling • Input: x (a word need to tag or a doc need to classify) • Features • Ends in –ing • Previous word is “the” • Each feature fi, weight wi • Particular class c • Z is a normalizing factor, used to make the prob. sum to 1

Maximum Entropy Modeling C = {c1, c2, …, cC} Normalization fi: A feature that only takes on the values 0 and 1 is also called an indicator function In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature ifor a particular class c for a given observation x

Maximum Entropy Modeling Assume C = {NN, VB}

Learning Maximum Entropy Model

HMM vs. MEMM MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible HMM MEMM word class

Conditional Random Fields (CRFs) • CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy • Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation • However, CRFs are better able to trade off decisions at different sequence positions • MEMM were found to suffer from the label bias problem

Label Bias • The problem appears when the MEMM contains states with different output degrees • Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states • In the extreme case, transition from a state with degree 1always gets probability 1, effectively ignoring the observation • CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence

Chapter 6. Hidden Markov and Maximum Entropy Model

Chapter 6. Hidden Markov and Maximum Entropy Model

Presentation Transcript

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF )

Hidden Markov Model

Reduction of Maximum Entropy Models to Hidden Markov Models

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Maximum Entropy Model

Hidden Markov Model

Hidden Markov model

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY

Hidden Markov Model

Maximum Entropy Model

MAXIMUM ENTROPY MARKOV MODEL

Entropy of Hidden Markov Processes

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model Lecture #6

Hidden Markov Model

Hidden Markov Model