Dynamic Match Lattice Spotting

Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

Overview • Phonetic-based index  open-vocabulary • Based on lattice-spotting technique • Two-tier database • Dynamic-match rules • Algorithmic optimisations NOTE: Patented technology

g r ax s iy th ay n r nx ow d m nx ae … … … … … Concept greasy Phone decomposition ?

Concept Target sequence: Dynamic matching Observed sequences: ax ih Costs

Indexing Feature Extraction Segmentation Audio Sequence Generation Hyper- Sequence Generation Speech Recognition Lattices Sequence DB Hyper- Sequence DB

Hyper-sequence Mapping • Map individual phones to “parent” classes • We use Vowels, Fricatives, Glides, Stops and Nasals • Simple example • Parent classes: Vowels, Consonants • Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB

Hyper-sequence Mapping Search term: Sequence DB Hyper-sequence: Hyper-sequence DB

Searching Term Phone decomp. Split long terms Results Hyper- mapping Dynamic Matching Merge long terms Keyword Verification Hyper- Sequence DB Sequence DB

Dynamic Matching • Minimum Edit Distance (MED) • i.e. Levenshtein Distance • Insertions, deletions, substitutions • Finds minimum cost of transformation

Dynamic Matching • Substitution costs • Derived from phone confusion statistics

Optimisations • Prefix sequence optimisation • Early stopping optimisation • Linearised MED search approximation

Long Term Merging olympic sites Search Search Merge Results

Keyword Verification • Acoustic • Use acoustic score from lattice to boost occurrences with high confidence • Neural Network • Produce a confidence score by fusing • MED score and Acoustic score • Term phone length • Term phone classes

Results Maximum Term-Weighted Value on EvalSet terms

Conclusion • Open-vocabulary and phone-based • Patented technology utilises • sequence and hyper-sequence databases • optimisations for rapid searches • Advantages • Other languages • Economy of scale

Conclusion • Limitations • Indexing speed and size • Need to split long sequences • Future work • Keyword Verification • Word-level information (e.g. LVCSR) • Acoustic features (e.g. prosody) • Indexing/searching frameworks • Spoken Document Retrieval and other semantic applications

References • A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March 2005 • K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication • CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict • S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. • V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp. 707-710.

Dynamic Match Lattice Spotting

Dynamic Match Lattice Spotting

Presentation Transcript

Word Spotting DTW

Waste Spotting

Spotting sun dogs

NY WINE SPOTTING

Spotting the sick child.

Spotting Phony Sites

Spotting faulty logic

Film Spotting 1

Spotting patterns

Spotting Techniques

Language Feature Spotting

Spotting Web Vulnerabilities

Saturation, Flat-spotting

Lattice Energy

BRAVAIS LATTICE

Keyword Spotting Dynamic Time Warping

DYNAMIC TIME WARPING IN KEY WORD SPOTTING

Answer Spotting

Sample spotting techniques

Spotting FACES

Spotting pseudoreplication

Lattice