1 / 18

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search

This paper discusses the use of position specific posterior lattices (PSPL) for indexing and searching spoken documents, as well as pruning techniques to optimize the ranking performance. It includes experiments conducted on a lecture material corpus.

wvalle
Download Presentation

Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and Alex Acero Microsoft Corporation ICASSP2006 Presented by Shih-Hung Liu 2006/06/12

  2. Outline • Introduction • Position specific posterior lattices (PSPL) • Spoken document indexing and search using PSPL • Pruning techniques • Relative pruning • Absolute pruning • Experiments • Conclusions

  3. Introduction • Speech search has not received much attention due to the fact that large collections of un-transcribed spoken material have not available, mostly due to storage constraints • As storage becomes cheaper, the availability of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit it them • Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns • This leads us to exploring an automatic approach for searching and navigating spoken document collections

  4. Introduction • The main research aiming at spoken document retrieval (SDR) was centered around the SDR-TREC evaluation • SDR systems index ASR 1-best output and use mean average precision (MAP) • However there are shortcomings to the SDR-TREC framework • the recognizers were heavily tuned for the domain leading to very good ASR performance • The effect of higher WER needs to be explored

  5. Position specific posterior lattices • The early Google approach shows that context and proximity issues were to be important factors for doing the relevance ranking score • If we want to extend this direction to spoken documents, we are faced with a dilemma • using 1-best ASR output is suboptimal due to high WER • ASR lattices do have much better WER, but the position information needed for rescoring a given word hit is not readily available

  6. Position specific posterior lattices • Position specific posterior Lattice (PSPL) was proposed as a way to extend the key-word search paradigm from text documents to spoken documents in scenarios with high WER • The approach calculates posterior probabilities of words at a given integer position – soft indexing – as a way to model the uncertainty of spoken documents

  7. Position specific posterior lattices • Since it is possible that more than one path contains the same word in the same position, one would need to sum over all possible paths in a lattice that contain a given word at given position • The computation for the backward pass stays unchanged, whereas during the forward pass one needs to split the forward probability arriving at a given node , , according to the length of the partial paths

  8. Position specific posterior lattices • Using the forward-backward variables the posterior probability of a given word occurring at a given position in the lattices can be easily calculated : • Finally, the Position Specific Posterior Lattice (PSPL) is a representation of the distribution : for each bin store the words along with their posterior probability

  9. Spoken document indexing and search using PSPL • In our case the speech content of a typical spoken document was approximately 1hr long; it is customary to segment a given speech file in shorter segments • A spoken document thus consists of an ordered list of segments. • For each segment we generate a corresponding PSPL lattices

  10. Spoken document indexing and search using PSPL • For all query terms, a 1-gram score is calculated by summing the PSPL posterior probability across all segment and position • The logarithmic tapering off is used for discounting the effect of large counts in a given document

  11. Spoken document indexing and search using PSPL • We calculate an expected tapered-count for each N-gram in the query • The different proximity types :

  12. Pruning techniques • The idea is to explore how the PSPL ranking performance behaves as a function of the level uncertainty transferred from the ASR part • For doing that, the idea is to prune the PSPL probability distribution before the generation of the N-gram expected counts

  13. Relative pruning • For a given PSPL position bin , the relative pruning first finds the most likely entry given by • The remaining entries are renormalized to make it a proper probability mass function and used to calculate the expected N-gram counts

  14. Absolute pruning • In this case, the PSPL framework considers the PSPL entries whose log-probability is higher than an absolute threshold • The threshold is absolute and consequently when is relatively close to zero, the PSPL contains only the bin entries that have high level of confidence, and some position bins become empty

  15. Experiments • Corpus • iCampus corpus prepared by MIT CSAIL • 169 hrs of lecture material recorded in the classroom • 90 Lectures (78.5 hrs) • 79 Assorted MIT World seminars (89.9 hrs) • The speech style is in between planned and spontaneous

  16. Experiments

  17. Experiments

  18. Conclusions • The PSPL framework provides an alternative way of dealing with the document’s content uncertainty for spoken document retrieval • The proposed pruning techniques provide ways of controlling the number of possible entries in the PSPL, based on their log-probability

More Related