170 likes | 294 Views
Dynamic Match Lattice Spotting. Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace. Overview. Phonetic-based index open-vocabulary Based on lattice-spotting technique
E N D
Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace
Overview • Phonetic-based index open-vocabulary • Based on lattice-spotting technique • Two-tier database • Dynamic-match rules • Algorithmic optimisations NOTE: Patented technology
g r ax s iy th ay n r nx ow d m nx ae … … … … … Concept greasy Phone decomposition ?
Concept Target sequence: Dynamic matching Observed sequences: ax ih Costs
Indexing Feature Extraction Segmentation Audio Sequence Generation Hyper- Sequence Generation Speech Recognition Lattices Sequence DB Hyper- Sequence DB
Hyper-sequence Mapping • Map individual phones to “parent” classes • We use Vowels, Fricatives, Glides, Stops and Nasals • Simple example • Parent classes: Vowels, Consonants • Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB
Hyper-sequence Mapping Search term: Sequence DB Hyper-sequence: Hyper-sequence DB
Searching Term Phone decomp. Split long terms Results Hyper- mapping Dynamic Matching Merge long terms Keyword Verification Hyper- Sequence DB Sequence DB
Dynamic Matching • Minimum Edit Distance (MED) • i.e. Levenshtein Distance • Insertions, deletions, substitutions • Finds minimum cost of transformation
Dynamic Matching • Substitution costs • Derived from phone confusion statistics
Optimisations • Prefix sequence optimisation • Early stopping optimisation • Linearised MED search approximation
Long Term Merging olympic sites Search Search Merge Results
Keyword Verification • Acoustic • Use acoustic score from lattice to boost occurrences with high confidence • Neural Network • Produce a confidence score by fusing • MED score and Acoustic score • Term phone length • Term phone classes
Results Maximum Term-Weighted Value on EvalSet terms
Conclusion • Open-vocabulary and phone-based • Patented technology utilises • sequence and hyper-sequence databases • optimisations for rapid searches • Advantages • Other languages • Economy of scale
Conclusion • Limitations • Indexing speed and size • Need to split long sequences • Future work • Keyword Verification • Word-level information (e.g. LVCSR) • Acoustic features (e.g. prosody) • Indexing/searching frameworks • Spoken Document Retrieval and other semantic applications
References • A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 • K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication • CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict • S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. • V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.