330 likes | 460 Views
Word-subword based keyword spotting with implications in OOV detection. Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44 th Asilomar Conference on Signals, Systems and Computers, 8.11.2010. Agenda.
E N D
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University of Techbnology BUT Speech@FIT 44th Asilomar Conference on Signals, Systems and Computers, 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Goal of STD and glossary of terms Goal: detect keywords or key-phrases in input speech, for each detection, output: • Identity • Position • Score Glossary • Large Vocabulary Continuous Speech Recognizer – LVCSR – system converting spoken speech into text. • Out-of-vocabulary – OOV – word which is not in the LVCSR vocabulary. • Term – textual entry consisting of one or more words in sequence. • Spoken Term Detection – STD – a way to search for a term in spoken data. • Subword(s) – unit(s) that are parts of words (phones, syllables, automatically found, etc.). ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Word-based STD • Due to the presence of language model, Word-based STD systems are reaching better accuracies than acoustic ones. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Implementation • Term is searched in recognition lattice • Allows to estimate posterior probability of a term. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
The OOV problem REF: THIS IS AN EXAMPLE OF RECOGNIZER OUTPUT REC: THIS IS AMEX APPLE OF RECOGNIZER OUTPUT • One OOV causes several errors: • OOV can not be found (in the output of LVCSR). • OOV impairs recognition of neighboring words. • OOV usually carries lot of information (named entity). • We need to handle OOVs ! • Word accuracy. • Spoken term detection accuracy. • Practical (memory, CPU, index size, etc.). ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Answer to OOV problem – sub-word STD • Subword recognizer is built (output is subword lattice). • Term is converted from words to sequence of subwords. • This sequence is searched in the subword lattice. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Evaluation - TWV • Defined by NIST for NIST STD 2006 evaluation: • one number • higher is better • depending on normalization • Requires full STD system ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Normalization-independent evaluation - UBTVW • UBTWV - Upper Bound Term Weighted Value • Finds optimum threshold for each term • one number • higher is better • Independent on normalization ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Data • NIST STD 2006 evaluations. • 3h of English telephone conversations. • 373 1-4 words long terms occurring 4737/196 times. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Recognizer I. • LVCSR developed in AMI/AMIDA project • State-of the art system including VTLN, MPE, posterior features, SAT, 3 passes. • Acoustic models trained on 278h of speech. • Language model trained on 977M word tokens (50k vocabulary). • Dictionary pruned to generate OOVs -> WRDRED. • Word accuracy – 69.04%. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Results • Words • Words converted to phones • Phone recognizer Phones too small => need longer units ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Better subwords – phone multigrams • Statistics of phone n-grams are collected (up to 6) from training data (phone transcriptions of speech). • Probabilities of all units are estimated. • Training data are segmented by the most probable sequence of multigrams. • Statistics are recomputed and low occurring units are deleted. Several iterations. • N-gram language model is estimated on top of the multigram segmentation of the training data. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Constrained multigrams • nosil – sil is not part of multigram unit. • noxwrd – add information of word boundary to multigram unit. Term (word representation): PRIME MINISTER Term pronunciation: p r ay m m ih n ih s t axr Term (subword representation): *p-r-ay m* *m-ih-n ih-s t-axr* ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Results • Subword search can process OOV terms. • Subword search is not so accurate as word search of in-vocabulary terms. • Subword search consumes more index space. => Need for combination of word and subword searches. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Parallel word-subword … works, but needs to maintain and run 2 systems. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Hybrid word-subword ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Implementation by composition of networks ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Multigram dictionary for hybrid system • For hybrid system, phone multigrams must not be trained on utterances. • Phone multigrams are trained on dictionary. • Experimented with LVCSR vs. big vs. OOV dictionary. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Results – different configurations • Pruning factors play role in the memory consumption, size of index, RT factor … • “Reasonable system” • ~2.5x slower than word • ~2.5x bigger index than word • Matches the accuracy of word system for IV • OOVs found. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
OOV detection by the hybrid system Comparison of the subword confidence measure to a threshold => detection of OOVs ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
OOV recovery Use of phoneme to grapheme (P2G) to derive word-form of detected OOV ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Alignment error model • Some detected OOVs could be even converted back to in-vocabulary words ! • But the phone pronunciation in 1-best output is not ideal… • … alignment error model • Parameters (probabilities of deletion, insertion, substitution) trained from data. • Can process dictionary and look up detected OOVs. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Going more complex … Can construct an wFST accounting for • Sequences of in-vocabulary words • In-vocabulary words + common pre- and suffixes • OOVs • And combinations … m ey sh en ->INFORMATION ae l k ax hh aa l ih z em (ALCOHOLISM) ->ALCOHOL / ISM aa f ax s m ae k s (’Office Max’) ->OFFICE OOV1572 ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
OOV clustering • Alignment model allows for the evaluation of similarity • Clustering possible ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Agenda • Word-based STD, OOV problem, subwords • Experiments • Sub-word units • Hybrid word-subword system • What can we do with OOVs • Conclusion ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Conclusion • Subword system with constrained multigrams - very good STD performace and OOV tolerant system. • Improved hybrid word-subword system tested from STD accuracy and real application point of view. • Hybrid system brings better accuracy/size ratio and is faster than the standalone system. • It works well in a real indexing & search engine. • With a hybrid system, we can • Recover OOVs (simple P2G or more elaborate model) • Measure similarity of OOVs • Cluster them, find re-occurring ones, update vocabulary. ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Reading and playing with • Igor Szöke: Hybrid word-subword spoken term detection, Ph.D. thesis, Brno University of Technology, Oct 2010 • Stefan Kombrink, Mirko Hannemann, Lukáš Burget, and Hynek Heřmanský: Recovery of Rare Words in Lecture Speech, in Proc. Text, Speech and Dialogue (TSD) 2010, Brno, 2010 • Mirko Hannemann, Stefan Kombrink, Martin Karafiát, and Lukáš Burget: Similarity Scoring for Recognizing Repeated Out-of-VocabularyWords, in Proc. Interspeech 2010, Makuhari, Japan, 2010. • … ‘Publications’ section of http://speech.fit.vutbr.cz/ • http://www.superlectures.com/odyssey/ ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010
Thank you for your attention ASILOMAR SS & C Černocký, Szöke, Hanneman, Kombrink 8.11.2010