300 likes | 993 Views
Forward-backward algorithm. LING 572 Fei Xia 02/23/06. Outline. Forward and backward probability Expected counts and update formulae Relation with EM. HMM. A HMM is a tuple : A set of states S={s 1 , s 2 , …, s N }. A set of output symbols Σ ={w 1 , …, w M }.
E N D
Forward-backward algorithm LING 572 Fei Xia 02/23/06
Outline • Forward and backward probability • Expected counts and update formulae • Relation with EM
HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • State transition prob: A={aij}. • Symbol emission prob: B={bijk} • State sequence: X1…XT+1 • Output sequence: o1…oT
oT o1 o2 XT+1 … XT X1 X2 Decoding • Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T). Viterbi algorithm
Notation • A sentence: O1,T=o1…oT, • T is the sentence length • The state sequence X1,T+1=X1 … XT+1 • t: time t, range from 1 to T+1. • Xt: the state at time t. • i, j: state si, sj. • k: word wk in the vocabulary
Forward probability The probability of producing oi,t-1 while ending up in state si:
Calculating forward probability Initialization: Induction:
Backward probability • The probability of producing the sequence Ot,T, given that at time t, we are at state si.
Calculating backward probability Initialization: Induction:
Estimating parameters • The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)
Expected counts Sum over the time index: • Expected # of transitions from state i to j in O: • Expected # of transitions from state i in O:
Emission probabilities Arc-emission HMM:
The inner loop for forward-backward algorithm Given an input sequence and • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: • Update the parameters:
Relation to EM • HMM is a PM (Product of Multi-nominal) Model • Forward-back algorithm is a special case of the EM algorithm for PM Models. • X (observed data): each data point is an O1T. • Y (hidden data): state sequence X1T. • Θ (parameters): aij, bijk, πi.
Iterations • Each iteration provides values for all the parameters • The new model always improve the likeliness of the training data: • The algorithm does not guarantee to reach global maximum.
Summary • A way of estimating parameters for HMM • Define forward and backward probability, which can calculated efficiently (DP) • Given an initial parameter setting, we re-estimate the parameters at each iteration. • The forward-backward algorithm is a special case of EM algorithm for PM model
Definitions so far • The prob of producing O1,t-1, and ending at state si at time t: • The prob of producing the sequence Ot,T, given that at time t, we are at state si: • The prob of being at state i at time t given O:
Emission probabilities Arc-emission HMM: State-emission HMM: