250 likes | 568 Views
Extended Baum-Welch algorithm. Present by shih-hung Liu 20060121. References. A generalization of the Baum algorithm to rational objective function - [Gopalakrishnan et al.] IEEE ICASP 1989
E N D
Extended Baum-Welch algorithm Present by shih-hung Liu 20060121
References • A generalization of the Baum algorithm to rational objective function-[Gopalakrishnan et al.] IEEE ICASP 1989 • An inequality for rational function with applications to some statistical estimation problems [Gopalakrishnan et al.] - IEEE Transactions on Information Theory 1991 • HMMs, MMIE, and the Speech Recognition problem-[Normandin 1991] PhD dissertation • Function maximization- [Povey 2004] PhD thesis chapter 4.5 NTNU Speech Lab.
Outline • Introduction • Extended Baum-Welch algorithm [Gopalakrishnan et al.] • EBW from discrete to continuous [Normandin] • EBW for discrete [Povey] • Example of function optimization [Gopalakrishnan et al.] • Conclusion NTNU Speech Lab.
Introduction • The well-known Baum-Eagon inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values • However, we are interesting in maximizing a general rational function. We extend the Baum-Eagon inequality to rational function NTNU Speech Lab.
Extended Baum-Welch algorithm (1/6) [Gopalakrishnan 1989] • an arbitrary homogeneous polynomial with nonnegative coefficient of degree d in variables Assuming that this polynomial is defined over a domain of probability values, they show how to construct a transformation for some such that following the property: property A : for any and , unless NTNU Speech Lab.
Extended Baum-Welch algorithm (2/6) [Gopalakrishnan 1989] • is a ratio of two polynomials in variables defined over a domain we are looking for a growth transformation such that for any and , unless • A reduction of the case of rational function to polynomial we reduce the problem of finding a growth transformation for a rational function to of finding that for a specially formed polynomial • reduce to Non-homogeneous polynomial with nonnegative • Extend Baum-Eagon inequality to Non-homogeneous polynomial with nonnegative NTNU Speech Lab.
Extended Baum-Welch algorithm (3/6) [Gopalakrishnan 1989] • Step1: NTNU Speech Lab.
Extended Baum-Welch algorithm (4/6) [Gopalakrishnan 1989] • Step2: NTNU Speech Lab.
Extended Baum-Welch algorithm (5/6) [Gopalakrishnan 1989] • Step3: finding a growth transformation for a polynomial with nonnegative coefficients can be reduce to the same problem for a homogeneous polynomial with nonnegative coefficients 1 NTNU Speech Lab.
Extended Baum-Welch algorithm (6/6) [Gopalakrishnan 1989] • Baum-Eagon inequality: NTNU Speech Lab.
EBW for CDHMM – from discrete to continuous (1/3) [ Normandin 1991 ] • Discrete case for emission probability update NTNU Speech Lab.
j j j EBW for CDHMM – from discrete to continuous (2/3) [ Normandin 1991 ] M subintervals Ik of width NTNU Speech Lab.
EBW for CDHMM – from discrete to continuous (3/3) [ Normandin 1991 ] EBW NTNU Speech Lab.
EBW for discrete HMMs (1/6) [Povey 2004] • The Baum-Eagon inequality is formulated for the case where there are variables in a matrix containing rows with a sum-to-one constraint , and we are maximizing a sum of polynomial terms in with nonnegative coefficient • For ML training, we can find an auxiliary function and optimize it • Finding the maximum of the auxiliary function (e.g. using lagrangian multiplier) leads to the following update, which is a growth transformation for the polynomial: NTNU Speech Lab.
EBW for discrete HMMs (2/6) [Povey 2004] • The Baum-Welch update is an update procedure for HMMs which uses this growth transformation together with an algorithm known as the forward-backward algorithm for finding the relevant differentials efficiently NTNU Speech Lab.
EBW for discrete HMMs (3/6) [Povey 2004] • An update rule as convenient and provable correct as the Baum-Welch update is not available for discriminative training of HMMs, which is a harder optimization problem • The Extended Baum-Welch update equation as originally derived is applicable to rational function of parameters which are subject to sum-to-one constraints • The MMI objective function for discrete-probability HMMs is an example of such a function NTNU Speech Lab.
EBW for discrete HMMs (4/6) [Povey 2004] 1. Instead of maximizing for positive and ,we can instead maximize where and are the value of previous iteration ; increasing will cause to increase this is because is a strong sense auxiliary function for around 2. If some terms in the resulting polynomial are negative, we can add to the expression a constant C times a further polynomial which is constrained to be a constant (e.g. ), so as to ensure that no product of terms in the final expression has a negative coefficient two essential points used to derive the EBW update for MMI NTNU Speech Lab.
EBW for discrete HMMs (5/6) [Povey 2004] By applying these two ideas : NTNU Speech Lab.
EBW equivalent smooth function (6/6) [Povey 2004] NTNU Speech Lab.
Example • consider C NTNU Speech Lab.
Example NTNU Speech Lab.
Conclusion • Presented an algorithm for maximization of certain rational function define over domain of probability values • This algorithm is very useful in practical situation for training HMMs parameters NTNU Speech Lab.
MPE: Final Auxiliary Function weak-senseauxiliary function strong-sense auxiliary function smoothing function involved weak-sense auxiliary function NTNU Speech Lab.
EBW derived from auxiliary function NTNU Speech Lab.
EBW derived from auxiliary function NTNU Speech Lab.