1 / 33

Word-subword based keyword spotting with implications in OOV detection

Word-subword based keyword spotting with implications in OOV detection. Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44 th Asilomar Conference on Signals, Systems and Computers, 8.11.2010. Agenda.

rane
Download Presentation

Word-subword based keyword spotting with implications in OOV detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44th Asilomar Conference on Signals, Systems and Computers, 8.11.2010

  2. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  3. Goal of STD and glossary of terms Goal: detect keywords or key-phrases in input speech, for each detection, output: • Identity • Position • Score Glossary • Large Vocabulary Continuous Speech Recognizer – LVCSR – system converting spoken speech into text. • Out-of-vocabulary – OOV – word which is not in the LVCSR vocabulary. • Term – textual entry consisting of one or more words in sequence. • Spoken Term Detection – STD – a way to search for a term in spoken data. • Subword(s) – unit(s) that are parts of words (phones, syllables, automatically found, etc.). ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  4. Word-based STD • Due to the presence of language model, Word-based STD systems are reaching better accuracies than acoustic ones. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  5. Implementation • Term is searched in recognition lattice • Allows to estimate posterior probability of a term. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  6. The OOV problem REF: THIS IS AN EXAMPLE OF RECOGNIZER OUTPUT REC: THIS IS AMEX APPLE OF RECOGNIZER OUTPUT • One OOV causes several errors: • OOV can not be found (in the output of LVCSR). • OOV impairs recognition of neighboring words. • OOV usually carries lot of information (named entity). • We need to handle OOVs ! • Word accuracy. • Spoken term detection accuracy. • Practical (memory, CPU, index size, etc.). ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  7. Answer to OOV problem – sub-word STD • Subword recognizer is built (output is subword lattice). • Term is converted from words to sequence of subwords. • This sequence is searched in the subword lattice. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  8. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  9. Evaluation - TWV • Defined by NIST for NIST STD 2006 evaluation: • one number • higher is better • depending on normalization • Requires full STD system ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  10. Normalization-independent evaluation - UBTVW • UBTWV - Upper Bound Term Weighted Value • Finds optimum threshold for each term • one number • higher is better • Independent on normalization ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  11. Data • NIST STD 2006 evaluations. • 3h of English telephone conversations. • 373 1-4 words long terms occurring 4737/196 times. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  12. Recognizer I. • LVCSR developed in AMI/AMIDA project • State-of the art system including VTLN, MPE, posterior features, SAT, 3 passes. • Acoustic models trained on 278h of speech. • Language model trained on 977M word tokens (50k vocabulary). • Dictionary pruned to generate OOVs -> WRDRED. • Word accuracy – 69.04%. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  13. Results • Words • Words converted to phones • Phone recognizer Phones too small => need longer units ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  14. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  15. Better subwords – phone multigrams • Statistics of phone n-grams are collected (up to 6) from training data (phone transcriptions of speech). • Probabilities of all units are estimated. • Training data are segmented by the most probable sequence of multigrams. • Statistics are recomputed and low occurring units are deleted. Several iterations. • N-gram language model is estimated on top of the multigram segmentation of the training data. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  16. Constrained multigrams • nosil – sil is not part of multigram unit. • noxwrd – add information of word boundary to multigram unit. Term (word representation): PRIME MINISTER Term pronunciation: p r ay m m ih n ih s t axr Term (subword representation): *p-r-ay m* *m-ih-n ih-s t-axr* ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  17. Results • Subword search can process OOV terms. • Subword search is not so accurate as word search of in-vocabulary terms. • Subword search consumes more index space. => Need for combination of word and subword searches. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  18. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  19. Parallel word-subword … works, but needs to maintain and run 2 systems. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  20. Hybrid word-subword ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  21. Implementation by composition of networks ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  22. Multigram dictionary for hybrid system • For hybrid system, phone multigrams must not be trained on utterances. • Phone multigrams are trained on dictionary. • Experimented with LVCSR vs. big vs. OOV dictionary. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  23. Results – different configurations • Pruning factors play role in the memory consumption, size of index, RT factor … • “Reasonable system” • ~2.5x slower than word • ~2.5x bigger index than word • Matches the accuracy of word system for IV • OOVs found. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  24. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  25. OOV detection by the hybrid system Comparison of the subword confidence measure to a threshold => detection of OOVs ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  26. OOV recovery Use of phoneme to grapheme (P2G) to derive word-form of detected OOV ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  27. Alignment error model • Some detected OOVs could be even converted back to in-vocabulary words ! • But the phone pronunciation in 1-best output is not ideal… • … alignment error model • Parameters (probabilities of deletion, insertion, substitution) trained from data. • Can process dictionary and look up detected OOVs. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  28. Going more complex … Can construct an wFST accounting for • Sequences of in-vocabulary words • In-vocabulary words + common pre- and suffixes • OOVs • And combinations … m ey sh en ->INFORMATION ae l k ax hh aa l ih z em (ALCOHOLISM) ->ALCOHOL / ISM aa f ax s m ae k s (’Office Max’) ->OFFICE OOV1572 ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  29. OOV clustering • Alignment model allows for the evaluation of similarity • Clustering possible ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  30. Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  31. Conclusion • Subword system with constrained multigrams - very good STD performace and OOV tolerant system. • Improved hybrid word-subword system tested from STD accuracy and real application point of view. • Hybrid system brings better accuracy/size ratio and is faster than the standalone system. • It works well in a real indexing & search engine. • With a hybrid system, we can • Recover OOVs (simple P2G or more elaborate model) • Measure similarity of OOVs • Cluster them, find re-occurring ones, update vocabulary. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  32. Reading and playing with • Igor Szöke: Hybrid word-subword spoken term detection, Ph.D. thesis, Brno University of Technology, Oct 2010 • Stefan Kombrink, Mirko Hannemann, Lukáš Burget, and Hynek Heřmanský: Recovery of Rare Words in Lecture Speech, in Proc. Text, Speech and Dialogue (TSD) 2010, Brno, 2010 • Mirko Hannemann, Stefan Kombrink, Martin Karafiát, and Lukáš Burget: Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords, in Proc. Interspeech 2010, Makuhari, Japan, 2010. • … ‘Publications’ section of http://speech.fit.vutbr.cz/ • http://www.superlectures.com/odyssey/ ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

  33. Thank you for your attention ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010

More Related