210 likes | 363 Views
Answer Spotting. Herbert Gish and Owen Kimball June 11, 2002. Project Goals. Primary Objectives Develop answer-spotting technology to provide analysts with best answers available from a spontaneous speech database
E N D
Answer Spotting Herbert Gish and Owen Kimball June 11, 2002
Project Goals • Primary Objectives • Develop answer-spotting technology to provide analysts with best answers available from a spontaneous speech database • Develop application for multiple languages and with potentially limited resources • Low-density languages • Application Features • Explain to the user the basis for decisions • Export semantic components of answer to a multi-media system • Account for variability in resources in extracting information • Enable rapid deployment in new languages
BBN Approach • Topic-dependent language models • Semantic-category class grammars • Unsupervised training methods
Query Formation • A query for the conversational speech domain addresses: • Topic • Domains of interest to the analyst • Semantic categories/classes • Key topic components • Keywords/Phrases • The elements of the semantic categories
Answer Spotting Approach • Recognize semantic category specific language activity • Generalization of word and phrase spotting • Integrate topic search into recognizer for best answers into the speech recognition process • Use topic relevant language model(s) to select relevant data • Incorporate semantic classification of words or phrases into language model used in recognition • Requires minimal resources and provides the best performance • Train topic classifiers without supervision • Post-process speech recognition output to put together semantic components of answer
Choice of Corpora • Desired Corpora Features • Spontaneous (telephone) speech • Conversations between people • Consistent query formation and answer representation from data • Selected Corpora • Switchboard • Spontaneous telephone conversations between strangers • Topic-driven conversations • Abundant amounts of transcribed data • Callhome • Spontaneous telephone conversations between family members • Corpora available in multiple languages: Spanish, Mandarin and Arabic
Query Formation for Switchboard • Topics • Selected 5 diverse topics • Topic descriptions: Buying a car, Credit cards, News media, Vacation spots, Music • Amount of data for each topic varies from 30 to 60 conversations • Semantic Categories/Classes • For each topic, defined a set of semantic categories • At least 5 categories per topic were picked • Manual annotation of semantic categories – no syntactic information used in annotation • User-Defined Keywords/Phrases
Topic: Buying a Car What kind of car do you think you might buy next? What sorts of things will enter your decision? See if your requirements and the other caller’s requirements are similar
Topic: News Media Discuss howyou and the caller keep up with current event.
System Components • Recognizer • State-of-the-art Byblos system • Real-time or near real-time performance • Topic Identifier • Parallel language model structure in recognizer that separates query topic from non-topic data • Topic & text integrator that uses language model information and word confidences to filter relevant text • Category Identifier • Categories integrated into the language model or • Use separate component, for example, Identifinder
An Alternate Configuration • Recognizer employs a standard n-gram language model • Topics identified after recognition • Semantic content extracted after recognition • Can provide a baseline for the original configuration • The choice for low WER recognition Semantic Parallel Language Model (Identifinder) Information Recognizer Topic Id Speech
Category Identification - Identifinder Semantic Class 1 Semantic Class N End of sentence Start of sentence Not-a-Semantic-Class • Identifinder is an HMM with internal states defined by the semantic classes and a single “not-a-semantic-class state. • The state generates words conditioned on the previous state as well as the previous word. • Word features can also be used in addition to the word identity.
Portability Across Languages • Use Callhome corpora for testing system capabilities • Callhome English has conversations between family members • Topics range from family events to immigration issues • Callhome is available in multiple languages • Languages that can be tested include Spanish, Mandarin and Arabic • Limited data and linguistic resources are available in these languages posing additional technical challenges
Two Modes of Operation • Significant Resources • Moderate WER (30%-40%)LVCSR available • Hundreds (nominally) of hours of transcribed data • Large phonetic dictionary • Limited Resources • Few hours of transcribed data in the domain of interest • Dictionary limited to training
Using Limited Resources • Investigate effect of variations in data on various system components • Impact of reduced number of manually annotated conversations on category identification • Use word clustering on other available text resources to find words that fit into the semantic classes of interest • Use relevance feedback techniques, where the user provides feedback that can be used to adapt system response • Impact of reduced transcriptions for acoustic/language modeling on recognition performance • Use auto-transcription techniques if additional audio data is available • Use newspaper & broadcast news text available to augment language modeling performance
Using Limited Resources • Building system with limited data resources and/or linguistic expertise • Enabling rapid deployment in new languages where linguistic resources (for example, word pronunciation dictionary or word transcriptions) are limited • Annotating topics and semantic categories on a new language where transcriptions are limited
Progress Overview • Annotation completed • Semantic categories • Integrated recognizer-semantics • Language model still being developed • Baseline system (LVCSR/Identifinder) Implemented • Initial experiments measuring performance of Identifinder on semantic categories • Topic classification in the limited data regime • Topic classification with approximately 4 hours of training • Technology has been transferred to a government agency • Classification performance with diffuse topics
Initial Experiments • Finding the Semantic Categories • Employed recognition followed by identifinder • Real-time Byblos recognizer trained on Switchboard • WER 38% • Trained Indentifinder with annotated data from the 3 topics • Evaluated on manual transcription and after decoding • Finding Topics with Limited Resources
Finding Semantic Categories cont. • Comparison of performance on manually transcribed speech and after speech recognition Manually Transcribed After Speech Recognition
Discussion • Initial results show that semantic categories can survive recognition errors • Experiments with limited training are giving very encouraging results • Need to integrate language model into the recognizer • Explore semantic categories in the limited data situation • Provide confidence measures for keywords/phrases • Investigate other methods for characterizing performance • Characterize performance a function of word error rate • Start work on demo system