Natural Language Processing

Natural Language Processing Spring 2007 V. “Juggy” Jagannathan

Course Book Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze

Chapter 9 Markov Models March 5, 2007

Markov models • Markov assumption • Suppose X = (X1, …, XT) is a sequence of random variables taking values in some finite set S = {s1,…,sN}, Markov properties are: • Limited Horizon • P(Xt+1 = sk|X1,…,Xt) = P(Xt+1 = sk|Xt) • i.e. the t+1 value only depends on t value • Time invariant (stationary) • Stochastic Transition matrix A: • aij = P(Xt+1 = sj|Xt=si) where

Markov model example

Hidden Markov Model Example Probability: {lem,ice-t} given the machine starts in CP? 0.3x0.7x0.1+0.3x0.3x0.7 =0.021+0.063 = 0.084

Why use HMMs? • Underlying events  generating surface observable events • Eg. Predicting weather based on dampness of seaweeds • http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html • Linear Interpolation in n-gram models:

Look at Notes from David Meir Blei [UC Berkley] http://www-nlp.stanford.edu/fsnlp/hmm-chap/blei-hmm-ch9.ppt Slides 1-13

(Observed states)

Forward Procedure

Initialization: Induction: Total computation: Forward Procedure

Initialization: Induction: Total computation: Backward Procedure

Combining both – forward and backward

Finding the best state sequence To determine the state sequence that best explains observations Let: Individually the most likely state is: This approach, however, does not correctly estimate the most likely state sequence.

Finding the best state sequenceViterbi algorithm Store the most probable path that leads to a given node Initialization Induction Store Backtrace

Parameter Estimation

Parameter Estimation Probability of traversing an arc at time t given observation sequence O:

Parameter Estimation

Natural Language Processing