Pei- Ning Chen NTNU CSIE SLP Lab

Effects of Query Expansion for Spoken Document Passage RetrievalTomoyosiAkiba, KoichiroHondaINTERSPEECH 2011 Pei-Ning Chen NTNU CSIE SLP Lab

Outline • Introduction • Passage Retrieval for Spoken Document • Query Expansion for SDR • Experiments • Conclusions

Introduction • Because confirming the content of a spoken document requires playing back its audio data, browsing speech data is much more difficult and time-consuming than browsing textual data. • They apply relevance models, a query expansion method, for the spoken document passage retrieval task. They adapted the original relevance model for passage retrieval, and also extended it to benefit from massive collections of Web documents for query expansion.

Retrieval Methods for Passage Retrieval • Using the Neighboring Context to Index the Passage • Passages from the same lecture may be related to each other in the passage retrieval task, whereas the target documents are considered to be independent of each other in a conventional document retrieval task. • Penalizing Neighboring Retrieval Results • In applying context indexing, neighboring passages are liable to be retrieved at the same time as they share the same indexing words.

Query Expansion for SDR • Relevance Models • Extending Relevance Models to Context Indexing • Extending Relevance Models using Web

Linear interpolation: • the two models are linearly interpolated: • Document weighting: • the Web model is used to weight the target documents:

Experiments

Conclusions • They applied relevance models for the spoken document passage retrieval task. • They also extended it to take advantage of the massive collection of Web documents for query expansion. • In order to improve the performance of their Web extension of relevance models, filtering for noisy Web documents might be necessary. • In future work, we will apply Web document filtering methods to select only the documents most related to the target documents.

Speech Indexing Using Semantic Context InferenceChien-Lin Huang, Bin Ma, Haizhou Li and Chung-Hsien WuINTERSPEECH 2011

Outline • Introduction • Semantic Context Inference • Experiments • Conclusions

Introduction • The indexing techniques of text-based information retrieval have been widely adopted in spoken document retrieval • However, due to imperfect speech recognition results, out-of vocabulary, and the ambiguity in homophone and word tokenization, conventional text-based indexing techniques are not always appropriate for spoken document retrieval

Semantic Context Inference(SCI) • They proposed the semantic context inference representation by finding the semantic relation between terms, and suggesting semantic term expansion for speech indexing

Semantic relation matrix • A spoken document database comprises an accu-mulation of spoken documents from which the document-by-term matrix

SCI for indexing • By summing up all the semantic inference vectors for the spoken document d, we finally obtain the semantic context inference vector

Retrieval model • For spoken document retrieval, we adopt the vector space models which have been widely used in information retrieval by offering a highly efficient retrieval with a feature vector representation for a document

Experiments • To measure the accuracy of retrieved documents and the ranking position of the relevant document, they use the mean average precision to evaluate.

Conclusions • The proposed semantic context inference explores the latent semantic information and extends the semantic related terms to speech indexing. The semantic context inference vector can be regarded as a re-weighing indexing vector which is a way of query expansion to overcome speech recognition errors.

Pei- Ning Chen NTNU CSIE SLP Lab