170 likes | 262 Views
FSG Implementation in Sphinx2. Mosur Ravishankar Jul 15, 2004. Outline. Input specification FSG related API Application examples Implementation issues. FSG Specification. “Assembly language” for specifying FSGs Low-level Most standards should compile down to this level
E N D
FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
Outline • Input specification • FSG related API • Application examples • Implementation issues FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
FSG Specification • “Assembly language” for specifying FSGs • Low-level • Most standards should compile down to this level • Set of N states, numbered 0 .. N-1 • Transitions: • Emitting or non-emitting (aka null or epsilon) • Each emitting transition emits one word • Fixed probability 0 < p <= 1. • One start state, and one final state • Null transitions can effectively give you as many as needed • Goal: Find the highest likelihood path from the start state to the final state, given some input speech FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
2 1 4 3 9 0 6 8 7 5 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T 0 1 0.5 to T 1 2 0.1 city1 … T 1 2 0.1 cityN T 2 3 1.0 from T 3 4 0.1 city1 … T 3 4 0.1 cityN T 4 9 1.0 T 0 5 0.5 from T 5 6 0.1 city1 … T 5 6 0.1 cityN T 6 7 1.0 to T 7 8 0.1 city1 … T 7 8 0.1 cityN T 8 9 1.0 FSG_END city1 city1 from e to cityN cityN city1 city1 from e to cityN cityN FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
[city] [city] from e to 2 1 1 4 3 9 from e [city] [city] to 0 boston 6 7 8 5 chicago pittsburgh 0 buffalo seattle A Better Representation • Composition of FSGs FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
[filler] [filler] [filler] [filler] [city] [city] from [filler] 2 1 3 4 9 [filler] e to 0 [filler] [filler] [filler] [filler] from e [city] [city] to 6 8 7 5 Multiple Pronunciations and Filler Words • Alternative pronunciations added automatically • Filler word transitions (silence and noise) added automatically • A filler self-transition at every state • Noise words added only if noise penalty (probability) > 0 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
FSG Related API • Loading during initialization (i.e., fbs_init()): • -fsgfn flag specifying an FSG file to load (similar to –lmfn flag) • Difference: FSG name is contained in the file • Dynamic loading: • char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file • Switching to an FSG: • uttproc_set_fsg (char *fsgname); • Deleting a previously loaded FSG: • uttproc_del_fsg (char *fsgname); • Old demos could be run with FSGs, simply by recompiling with new libraries FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
Mixed LM/FSG Decoding Example • (See lm_fsg_test.c) FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
[allphone] [city] [city] from [allphone] [allphone] 2 1 4 3 9 e to 0 [allphone] from e [city] [city] to 6 8 7 5 Another Example: Garbage Models • Extraneous speech could be absorbed using an allphone “garbage model” FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
B/W Training and Forced Alignment • Consolidate code for FSGs, Baum-Welch training, and forced alignment? • Sentence HMMs for training and alignment are essentially linear FSGs • Alternative pronunciations and filler words handled automatically • Differences: • B/W uses forward (and backward) algorithm instead of Viterbi • Alignment has to produce phone and state segmentation as well FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
Implementation • Straightforward expansion of word-level FSG into a triphone HMM network • Viterbi beam search over this HMM network • No major optimizations attempted (so far) • No lextree implementation (What?) • Static allocation of all HMMs; not allocated “on demand” (Oh, no!) • FSG transitions represented by NxN matrix (You can’t be serious!!) • Speed/Memory usage profile needs to be evaluated • Mostly new set of data structures, separate from existing ones • Should be easily ported to Sphinx3 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
p3 q3 q2 q1 p4 p1 p2 1 1 2 2 Implementation: FSG Expansion to HMMs word1 0 word2 word1 0 word2 FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
p1 p2 p3 p4 1 1 p1 p2 p2’ p1’ p1’’ p2’’ Implementation: Triphone HMMs word1 0 word1 p1 p2 p3 p4 p1’ p4’ 0 p1’’ p4’’ Multiple leaf HMMs for different right contexts Multiple root HMMs for different left contexts 1-phone words use SIL as right context Special case for 2-phone words FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
p1 p2 p3 p4 q1 q2 q3 Possible Optimization: Lextrees word1 wordN Lextree (associated with source state) FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
w w Possible Optimization: Path Pruning • If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned • But reconciling with lextrees is tricky, since labels are now blurred FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
Other Issues Pending • Dynamic allocation and management of HMMs • Implementation of absolute pruning • Lattice generation • N-best list generation • … FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)
Where Is It? • My copy of open source version of Sphinx2 • Someone needs to update the sourceforge copy • Html documentation has been updated FSG Implementation in Sphinx2 (rkm@cs.cmu.edu)