1 / 10

Chen, Yi-wen Dept. of Computer Science & Information Engineering

2012 ICASSP. Semantic Query Expansion and Context-based Discriminative Term Modeling for Spoken Document Retrieval. Tsung-wei Tu , Hung- yi Lee, Yu- yu Chou, Lin- shan Lee. Chen, Yi-wen Dept. of Computer Science & Information Engineering National Taiwan Normal University. ✩ 2012/4/17.

Download Presentation

Chen, Yi-wen Dept. of Computer Science & Information Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2012 ICASSP Semantic Query Expansion and Context-based Discriminative Term Modeling for Spoken Document Retrieval Tsung-weiTu, Hung-yi Lee, Yu-yu Chou, Lin-shan Lee Chen, Yi-wen Dept. of Computer Science & Information Engineering National Taiwan Normal University ✩2012/4/17

  2. Spoken Document Retrieval Text Information Retrieval Speech Information Retrieval High recognition errors!! One way to handle this problem is to estimate the probability from lattices to include many recognition hypotheses.

  3. Document Model SD A spoken document is first divided into spoken segments, and then each spoken segment is transcribed into a lattice. 在這段中,所有可能word sequence其平均word arcs 出現的機率的總和。 Acoustic and Language Model選中 的機率。 平均1個arc出現的機率。 is a word sequence in the lattice. is the set of all possible word sequences in the lattice for is the posterior prob. of the word sequence derived from the acoustic and language models. is the number of word arcs in . is the occurrence count of the term in

  4. Document Model Expected length of the segment 中平均每條 含的word arcs 數加總。 Doc 在lattice level 中,出現w 的機率。 Query Model Here we borrow the query-regularized mixture modeloriginally proposed for text information retrieval for query expansion. the expanded query model. Weight for top M Doc.

  5. Document Model Note the here positive and negative examples of each term are selected automatically in an unsupervised way, similar to the scenarios of pseudo-relevance feedback.

  6. Semantic Query Expansion : the expanded query model. The parameters and are then estimated by maximizing the following objective function. Instead of estimating a query dependent language model for word distribution, we now seek to estimate a query dependent language model for the distribution of latent topics . We assume the probabilities of observing all words given each latent topic are available, which are obtained from Probability Latent Semantic Analysis (PLSA)… in (5) and (6) is replaced by The parameters and are similarly estimated by maximizing the objective function in parallel with (7). , (8) 求 !

  7. Semantic Query Expansion The prob. to be used in (1) is then This probability can be further interpolated with the probability obtained by maximizing (7).

  8. Experimental Results

  9. Experimental Results

  10. References • Tsung-weiTu, Hung-yi Lee, Yu-yu Chou and Lin-shan Lee, “Semantic Query Expansion and Context-based Discriminative Term Modeling for Spoken Document Retrieval”, in ICASSP, 2012. • Tao Tao and ChengXiangZhai, “Regularized estimation of mixture models for robust pseudo-relevance feedback”, in SIGIR, 2006.

More Related