Forward-backward algorithm

Forward-backward algorithm LING 572 Fei Xia 02/23/06

Outline • Forward and backward probability • Expected counts and update formulae • Relation with EM

HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • State transition prob: A={aij}. • Symbol emission prob: B={bijk} • State sequence: X1…XT+1 • Output sequence: o1…oT

Constraints

oT o1 o2 XT+1 … XT X1 X2 Decoding • Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).  Viterbi algorithm

Notation • A sentence: O1,T=o1…oT, • T is the sentence length • The state sequence X1,T+1=X1 … XT+1 • t: time t, range from 1 to T+1. • Xt: the state at time t. • i, j: state si, sj. • k: word wk in the vocabulary

Forward and backward probabilities

Forward probability The probability of producing oi,t-1 while ending up in state si:

Calculating forward probability Initialization: Induction:

Backward probability • The probability of producing the sequence Ot,T, given that at time t, we are at state si.

Calculating backward probability Initialization: Induction:

Calculating the prob of the observation

Estimating parameters • The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)

The prob of being at state i at time t given O:

Expected counts Sum over the time index: • Expected # of transitions from state i to j in O: • Expected # of transitions from state i in O:

Update parameters

Final formulae

Emission probabilities Arc-emission HMM:

The inner loop for forward-backward algorithm Given an input sequence and • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: • Update the parameters:

Relation to EM

Relation to EM • HMM is a PM (Product of Multi-nominal) Model • Forward-back algorithm is a special case of the EM algorithm for PM Models. • X (observed data): each data point is an O1T. • Y (hidden data): state sequence X1T. • Θ (parameters): aij, bijk, πi.

Relation to EM (cont)

Iterations • Each iteration provides values for all the parameters • The new model always improve the likeliness of the training data: • The algorithm does not guarantee to reach global maximum.

Summary • A way of estimating parameters for HMM • Define forward and backward probability, which can calculated efficiently (DP) • Given an initial parameter setting, we re-estimate the parameters at each iteration. • The forward-backward algorithm is a special case of EM algorithm for PM model

Additional slides

Definitions so far • The prob of producing O1,t-1, and ending at state si at time t: • The prob of producing the sequence Ot,T, given that at time t, we are at state si: • The prob of being at state i at time t given O:

Emission probabilities Arc-emission HMM: State-emission HMM:

Forward-backward algorithm