H idden M arkov M odels milidiu@inf.puc-rio.br

Hidden Markov Models milidiu@inf.puc-rio.br PUC-Rio University

Agenda • Modeling • Problems • Language modeling • Tagging • Parameter estimation • Algorithms • Rabiner 89 • Viterbi 67 • Baum-Welch

FORMULA T I ON USE Formulation  Use SYMBOLS EMISSIONS observables hidden INFORMATIONS STATES

Finite state machine Information Structure • States • Initial • transient • terminal • Transitions 2 1 4 3 5 6

2 4 3 5 6 FSM & emissions 1 • Observables • Emissions • Hidden • states

title authors States & emissions BRAVE NEW WORLD by Aldous Huxley Emissions are state manifestations

Interdependence • Joint distribution P(x1, x2, x3, x4, o1, o2, o3, o4) P(x4 | x3).P(x3 | x2).P(x2 | x1).P(x1).P(o1| x1).P(o2| x2).P(o3| x3).P(o4| x4) X1 X2 X3 X4 O1 O2 O3 O4

Marginal distributions • Input P(a,b,c,d) joint • Output P(a,b); P(a,d); P(a,c,d); P(b,d) marginals • Solution • mutually disjoint events sum • Ex.: P(a,d) = b cP(a,b,c,d)

Independence • Independence P(a | b) = P(a) • P(a,b) = P(a) . P(b) • Conditional Independence P( a | b, c ) = P(a | c) • P( a , b | c ) = P(a | c ) . P( b | c)

Markov modeling • Markov property P( xn | xn-1, xn-2 , …, x1 ) = P ( xn | xn-1 ) • Present being given… past provides no relevant information about future • Markov chain • P(a,b,c,d) = P(a | b). P( b | c). P( c | d) . P(d)

Markov Chain P(x1 ,…, xn) = i=1nP(xi | xi-1) k = 1,…,n x(k) (x1 ,…, xk) x(k)  (xk ,…, xn) • P(x(k)) = i=1kP(xi | xi-1) • Dem.: compute marginal by sum • P(x(n)) = P(x(k+1) | xk).P(x(k) | x0) • Structural decomposition

Hidden Markov Model • State transitions • Next state depends only on current state • Emissions • Emission value depends only on current ( plus previous) state

2 4 3 5 6 HMM “A HMM is a finite state automaton with stochastic transitions and symbol emissions. (Rabiner 1989)” • A set of (hidden) states: 1,2,3,...,N • State transitions • Vocabulary of output symbolsΣ = {σ0, σ1, ..., σm } • State transition probabilities P( q → q’ )  P( xn= q’ | xn-1= q ) • Output observation likelihoods B( q ↑ σ )  P( on= σ | xn= q ) 1

HMM

Problems • Language modeling The probability of a string o ( of length l ) being emitted by an HMM M N^l • Tagging Recover the state sequence V(o|M) that has the highest probability of having produced the observation sequence o • Parameter estimation Given the observation sequence and a set of states, find the parameters that would make the observation sequence most likely

H idden M arkov M odels milidiu@inf.puc-rio.br