170 likes | 433 Views
H idden M arkov M odels milidiu@inf.puc-rio.br. PUC-Rio University. Agenda. Modeling Problems Language modeling Tagging Parameter estimation Algorithms Rabiner 89 Viterbi 67 Baum-Welch. FORMUL A T I ON. USE. Formulation Use. SYMBOLS. EMISSIONS. observables. hidden.
E N D
Hidden Markov Models milidiu@inf.puc-rio.br PUC-Rio University
Agenda • Modeling • Problems • Language modeling • Tagging • Parameter estimation • Algorithms • Rabiner 89 • Viterbi 67 • Baum-Welch
FORMULA T I ON USE Formulation Use SYMBOLS EMISSIONS observables hidden INFORMATIONS STATES
Finite state machine Information Structure • States • Initial • transient • terminal • Transitions 2 1 4 3 5 6
2 4 3 5 6 FSM & emissions 1 • Observables • Emissions • Hidden • states
title authors States & emissions BRAVE NEW WORLD by Aldous Huxley Emissions are state manifestations
Interdependence • Conditional information • P(a | b) = P(a,b) / P(b) • P(a | b ) . P(b) = P(a,b) = P( b | a ) . P(a) information reversion • Joint information P(a,b,c,d) P(a | b,c,d ). P( b,c,d) P(a | b,c,d ). P( b | c,d). P( c | d) . P(d)
Interdependence • Joint distribution P(x1, x2, x3, x4, o1, o2, o3, o4) P(x4 | x3).P(x3 | x2).P(x2 | x1).P(x1).P(o1| x1).P(o2| x2).P(o3| x3).P(o4| x4) X1 X2 X3 X4 O1 O2 O3 O4
Marginal distributions • Input P(a,b,c,d) joint • Output P(a,b); P(a,d); P(a,c,d); P(b,d) marginals • Solution • mutually disjoint events sum • Ex.: P(a,d) = b cP(a,b,c,d)
Independence • Independence P(a | b) = P(a) • P(a,b) = P(a) . P(b) • Conditional Independence P( a | b, c ) = P(a | c) • P( a , b | c ) = P(a | c ) . P( b | c)
Joint Independence • P(x | a) = P(x) = P(x | b) • P(x | a ) = P(x | b) • P(x | a, b ) = P(x | b, b) = P(x | b) = P(x) P(x | a, b ) = P(x)
Markov modeling • Markov property P( xn | xn-1, xn-2 , …, x1 ) = P ( xn | xn-1 ) • Present being given… past provides no relevant information about future • Markov chain • P(a,b,c,d) = P(a | b). P( b | c). P( c | d) . P(d)
Markov Chain P(x1 ,…, xn) = i=1nP(xi | xi-1) k = 1,…,n x(k) (x1 ,…, xk) x(k) (xk ,…, xn) • P(x(k)) = i=1kP(xi | xi-1) • Dem.: compute marginal by sum • P(x(n)) = P(x(k+1) | xk).P(x(k) | x0) • Structural decomposition
Hidden Markov Model • State transitions • Next state depends only on current state • Emissions • Emission value depends only on current ( plus previous) state
2 4 3 5 6 HMM “A HMM is a finite state automaton with stochastic transitions and symbol emissions. (Rabiner 1989)” • A set of (hidden) states: 1,2,3,...,N • State transitions • Vocabulary of output symbolsΣ = {σ0, σ1, ..., σm } • State transition probabilities P( q → q’ ) P( xn= q’ | xn-1= q ) • Output observation likelihoods B( q ↑ σ ) P( on= σ | xn= q ) 1
Problems • Language modeling The probability of a string o ( of length l ) being emitted by an HMM M N^l • Tagging Recover the state sequence V(o|M) that has the highest probability of having produced the observation sequence o • Parameter estimation Given the observation sequence and a set of states, find the parameters that would make the observation sequence most likely