1 / 13

A Heuristic Search Algorithm for LVCSR

A Heuristic Search Algorithm for LVCSR. Intelligent Multimedia Communications Lab. at POSTECH 정홍 , 박기윤. Contents. Search problem in LVCSR Heuristic search algorithms Reinforcement learning (RL) A heuristic function using RL Conclusion. Search Problem in LVCSR.

Download Presentation

A Heuristic Search Algorithm for LVCSR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. A Heuristic Search Algorithm for LVCSR Intelligent Multimedia Communications Lab. at POSTECH 정홍, 박기윤

  2. Contents • Search problem in LVCSR • Heuristic search algorithms • Reinforcement learning (RL) • A heuristic function using RL • Conclusion

  3. Search Problem in LVCSR • Connected-word model: sentence as a concatenation of words. • Given word models, find a seq. of word hypothesis giving the models best matched to observation. • Depending on hypothesis test scheduling, some variants possible[3].

  4. (Cont’d) • Blind or exhaustive search algorithms Depth-first search Breadth-first search For LVCSR, each node may correspond to word-internal states[2].

  5. Heuristic Search Algorithms • Heuristic function • For each node N, assigns the sum h(N) of partial path metric f(N) up to N and estimates g(N) of remaining path metric. • The more precise the estimates g(N) are, the more reliable pruning operation be. • Heuristic search algorithm • Visit nodes with large heuristic function value first. • Focus computational resources on the current decision.

  6. Reinforcement Learning (RL) • Incremental form of dynamic programming (by C.J.C.H. Watkins). • Agent select action to maximize expected return. Agent Reward & Next State Action Environment

  7. (Cont’d) • Given perfect Markov state, RL problem become Markov Decision Problem (MDP). • Policy : a mapping from each state s and action a to the probability of taking action a when in state s. • Return: a function of reward seq. • Value function : the expected return when starting in s and following .

  8. (Cont’d) • Generalized policy iteration (GPI): for deterministic policy • Policy evaluation: for a given policy • Policy improvement Policy Markov Model

  9. (Cont’d) • For prediction purpose, it is enough to find value function. • Lambda return instead of discounted return can be used to accelerate convergence rate.

  10. Heuristic Function Using RL • Take the pronunciation model[2], consisting of sub-word HMMs, as an environment model. 화 전 Word terminal Root 거 전 자 홍 Non-terminal

  11. (Cont’d) • Use algorithm to estimate heuristic function as a normalized value function. • Algorithm description • For each training word sample • Initialize s (to root) • While s is non-terminal • For all s

  12. Conclusion • LVCSR can be formulated as graph search problem. • Heuristic search algorithms are aim to focus limited computational resources on current decision step. • RL can be utilized to estimate expected scalar returns, and also to construct a heuristic function.

  13. Reference [1] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998. [2] Stefan Ortmanns and Hermann Ney, “The Time-Conditioned Approach in Dynamic Programming Search for LVCSR,”IEEE Trans. SAP, vol. 8, no. 6, November 2000. [3] Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, “Spoken Language Processing,” Prentice Hall, 2001.

More Related