130 likes | 320 Views
A Heuristic Search Algorithm for LVCSR. Intelligent Multimedia Communications Lab. at POSTECH 정홍 , 박기윤. Contents. Search problem in LVCSR Heuristic search algorithms Reinforcement learning (RL) A heuristic function using RL Conclusion. Search Problem in LVCSR.
E N D
A Heuristic Search Algorithm for LVCSR Intelligent Multimedia Communications Lab. at POSTECH 정홍, 박기윤
Contents • Search problem in LVCSR • Heuristic search algorithms • Reinforcement learning (RL) • A heuristic function using RL • Conclusion
Search Problem in LVCSR • Connected-word model: sentence as a concatenation of words. • Given word models, find a seq. of word hypothesis giving the models best matched to observation. • Depending on hypothesis test scheduling, some variants possible[3].
(Cont’d) • Blind or exhaustive search algorithms Depth-first search Breadth-first search For LVCSR, each node may correspond to word-internal states[2].
Heuristic Search Algorithms • Heuristic function • For each node N, assigns the sum h(N) of partial path metric f(N) up to N and estimates g(N) of remaining path metric. • The more precise the estimates g(N) are, the more reliable pruning operation be. • Heuristic search algorithm • Visit nodes with large heuristic function value first. • Focus computational resources on the current decision.
Reinforcement Learning (RL) • Incremental form of dynamic programming (by C.J.C.H. Watkins). • Agent select action to maximize expected return. Agent Reward & Next State Action Environment
(Cont’d) • Given perfect Markov state, RL problem become Markov Decision Problem (MDP). • Policy : a mapping from each state s and action a to the probability of taking action a when in state s. • Return: a function of reward seq. • Value function : the expected return when starting in s and following .
(Cont’d) • Generalized policy iteration (GPI): for deterministic policy • Policy evaluation: for a given policy • Policy improvement Policy Markov Model
(Cont’d) • For prediction purpose, it is enough to find value function. • Lambda return instead of discounted return can be used to accelerate convergence rate.
Heuristic Function Using RL • Take the pronunciation model[2], consisting of sub-word HMMs, as an environment model. 화 전 Word terminal Root 거 전 자 홍 Non-terminal
(Cont’d) • Use algorithm to estimate heuristic function as a normalized value function. • Algorithm description • For each training word sample • Initialize s (to root) • While s is non-terminal • For all s
Conclusion • LVCSR can be formulated as graph search problem. • Heuristic search algorithms are aim to focus limited computational resources on current decision step. • RL can be utilized to estimate expected scalar returns, and also to construct a heuristic function.
Reference [1] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction,” MIT Press, 1998. [2] Stefan Ortmanns and Hermann Ney, “The Time-Conditioned Approach in Dynamic Programming Search for LVCSR,”IEEE Trans. SAP, vol. 8, no. 6, November 2000. [3] Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, “Spoken Language Processing,” Prentice Hall, 2001.