1 / 20

The estimation of stochastic context-free grammars using the Inside-Outside algorithm

The estimation of stochastic context-free grammars using the Inside-Outside algorithm. 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH. Contents. Introduction The Inside-Outside algorithm Regular versus context-free grammar Pre-training The use of grammar minimization Implementation

Download Presentation

The estimation of stochastic context-free grammars using the Inside-Outside algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH

  2. Contents • Introduction • The Inside-Outside algorithm • Regular versus context-free grammar • Pre-training • The use of grammar minimization • Implementation • Conclusions

  3. Introduction - 1 • HMM => SCFG in speech recognition tasks • The advantages of SCFG’s • Ability to capture embedded structure within speech data • useful at lower levels such as phonological rule system • Learning: a simple extension of the Baum-Welch re-estimation procedure (Inside-Outside algorithm) • Little previous works of SCFG’s in speech • Two factors for the limited interest in speech • The increased power of CFG’s: not useful for natural language • If all of the sentences = finite, CFG = RG • The time complexity of the Inside-Outside algorithm • O(n3) : n = input string length + the # of grammar symbols

  4. Introduction - 2 • Usefulness of CFG’s in NL • The ability to model derivation probabilities > the ability to determine language membership • So, this paper • introduces the Inside-Outside algorithm • compares CFG with RG using the entropy of the language generated by each grammar • Reduction of time complexity of In-Outside algorithm • This paper • describes a novel pre-training algorithm (smaller iteration) • minimizes the number of non-terminal with grammar minimization (GM) : smaller symbols • implements the In-Outside algorithm using a parallel transputer array : smaller input string length

  5. The Inside-Outside algorithm - 1 • Chomsky Normal Form (CNF) in SCFG • Generated observation sequence: O = O1, O2, …, OT • The matrices of parameters • An application of SCFS’s • recognition : • training :

  6. The Inside-Outside algorithm - 2 • Definition of inner (e) and outer (f) probabilities i S Inner probability S i Outer probability i 1 s-1 s t t+1 T

  7. The Inside-Outside algorithm - 3 • Inner probability: be computed bottom-up • Case 1: (s=t) the form i  m • Case 2: (st) the form i  jk i j k s r r+1 t

  8. The Inside-Outside algorithm - 4 • Outer Probability: be computed top-down j j + k i i k 1 r s-1 s t T 1 s t t+1 r T

  9. The Inside-Outside algorithm - 5 • Recognition Process • By setting s=1, t=T, • By setting s=t,

  10. The Inside-Outside algorithm - 6 • Training Process

  11. The Inside-Outside algorithm - 7 • Re-estimation formula for a[i,j,k] and b[i,m]

  12. The Inside-Outside algorithm - 8 • The Inside-Outside algorithm 1. Choose suitable initial values for the A and B matrices 2. Repeat A = … {Equation 20} B = … {Equation 21} P = … {Equation 11} UNTIL change in P is less than a set threshold

  13. Regular versus context-free grammar • Measurements for the comparison • The entropy making an -representation of L • Empirical entropy • Language for the comparison: palindromes • The number of parameters for each grammar • SCFG: N(# of non-terminal), M(# of terminal) => N3+NM • HMM(RG): K(# of states), M(# of terminal) => K2+(M+2)K • Condition for the comparison : N3+NM  K2+(M+2)K • The result (the ability to model derivation probabilities) • SCFG > RG

  14. Pre-training - 1 • Goal: start off with good initial estimates • reducing the number of re-estimation cycles required (40%) • facilitating the generation of a good final model • Pre-training 1. Use Baum-Welch algorithm (O(n2)) to obtain a set of RG rules 2. RG rules (final matrices) => SCFG rules (initial matrices) 3. Start off the Inside-Outside algorithm (O(n3)) with the initial matrices • Time complexity: a n2 + b n3 << c n3 , if b << c

  15. Pre-training - 2 • Modification (RG => SCFG) (a) For each bjk, define Yjk with probability bjk. (b) For each aij, define Xi Ya Xj with probability aij. (c) For each Si, define S  Xi with probability Si. • If Xi Ya Xl with ail, S  Ya Xl with Siail (d) For each Fj, define Xj Ya with probability Fj. • If Yak with bak, Xjk with bak Fj • The remaining zero parameters => RG • all parameters += floor value; (floor value = 1/ # of non-zero parameters) • re-normalization for

  16. The use of grammar minimization - 1 • Goal: detect and eliminate redundant and/or useless symbols • Good grammar: self-embedding • CFG = self-embedding, if a A such that A *wAx and neither w nor x is . • Require more non-terminal symbols • Smaller n: speed up the Inner-Outer algorithm • Constraining the In-Outside algo. • Greedy symbols: take too many non-terminals • Constrains • allocate a non-terminal to each terminal symbol • force the remaining non-terminals to model hidden branching process • Infeasible for practical approaches (i.e. speech): because of inherent ambiguity

  17. The use of grammar minimization - 2 • Two ways for GM incorporated into the In-Outside algo. • First approach: computationally intractable • In-Out algo.: start with fixed maximum symbols • GM: periodically detect and eliminate redundant and useless symbols • Second approach: more practical • In-Out algo.: start with the desired number of non-terminals • GM: periodically(or log P(S) < threshold) detect and reallocate redundant symbols

  18. The use of grammar minimization - 3 • GM algorithm (ad hoc) 1. Detect greedy symbols in bottom-up fashion 1.1 redundant non-terminals are replaced by a single non-terminal 1.2 free the redundant non-terminals (free non-terminals) 1.3 the same rules are collapsed into a single rule by adding their probabilities 2. Fix the parameters of the remaining non-terminals involved in the generation of greedy symbols (excluded from (3) and (4)) 3. For each free non-terminal i, 3.1 b[i,m]= zero, if m is a greedy symbol, randomize b[i,m], otherwise. 3.2 a[i,j,k] = zero, if j and k are the non-terminals of step 2, randomize a[i,j,k], otherwise. 4. Randomize a[i,j,k] : i(the non-terminals of step2), j and k(free non-terminals)

  19. Implementation using transputer array • Goal: • Speed up the In-Outside algo. (100 times faster) • Split the training data into several subsets • The in-Outside algo. works independently on each subset • Implementation Computes the update parameter set and transmits it down the chain to all the others. SUN Control board Transputer 1 Transputer 2 Transputer 64 ... Each tranputer works independently on its own data set.

  20. Conclusions • Usefulness of CFG’s in NL • This paper • introduced the Inside-Outside algorithm in speech recognition • compares CFG with RG using the entropy of the language generated by each grammar in “toy” problem • Reduction of time complexity of In-Outside algorithm • This paper • described a novel pre-training algorithm (smaller iteration) • proposed an ad hoc grammar minimization (GM) : smaller symbols • implemented the In-Outside algorithm using a parallel transputer array : smaller input string length • Further Research • build SCFG models trained from real speech data

More Related