1 / 25

Passage Retrieval using HMMs

Passage Retrieval using HMMs. HARD 2004 University of Illinois at Urbana-Champaign Jing Jiang ChengXiang Zhai. Motivation – Variable Length Passages. Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game

coy
Download Presentation

Passage Retrieval using HMMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing Jiang ChengXiang Zhai

  2. Motivation – Variable Length Passages Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … APE20030922.0156 APE20030911.0887

  3. Motivation – Variable Length Passages document-dependent Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … HARD-422 video game crash APE20030922.0156 APE20030911.0887

  4. Motivation – Variable Length Passages query-dependent Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … HARD-422 video game crash HARD-443 hand-held electronics APE20030922.0156 APE20030911.0887

  5. Research Question • Passage length is • document-dependent • query-dependent How to detect variable-length passages?

  6. Previous Work on Passage Retrieval • Structural or semantic boundary • Passage is not query-specific. • Fixed-length • Passage length is not query-specific. • Passage content may not be coherent. • Arbitrary – MultiText • Only query words are considered. • Heuristics are used to reduce search space. • HMM-based • The method is promising, but previous work didn’t fully explore its potential.

  7. w w … w w w w … w w w w … w HMM-Based Method document

  8. w w … w w w w … w w w w … w p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1 R B2 HMM: p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM-Based Method Q: hand-held electronics relevant passage document

  9. w w … w w w w … w w w w … w p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1 R B2 HMM: p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM-Based Method Q: hand-held electronics relevant passage document B B … B R R R … R R R B … B

  10. B1 R B2 Constructing the HMM

  11. B1 R B2 E Constructing the HMM end-of-doc state

  12. B1 Q B2 E B3 Constructing the HMM end-of-doc state 0.005 0.01 smoothing achieved by transitions 0.99

  13. B1 FB B2 E B3 Constructing the HMM end-of-doc state 0.005 0.01 expanded query LM to incorporate feedback smoothing achieved by transitions 0.99

  14. B1 FB B2 E B3 Constructing the HMM transition probabilities trained for each document end-of-doc state 0.005 0.01 expanded query LM to incorporate feedback smoothing achieved by transitions 0.99

  15. true passage w w … … w w w w w w w w w w … … w w w w w w w w w w … … w w short passage with artificial boundary B1 FB B2 E B3 w … w w w w w w … w w w w w … w passage extended to the natural topical boundary Passage Extension

  16. Retrieval – Approach 1

  17. 1 2 3 … n Retrieval – Approach 1 ranking

  18. 1 2 3 … n Retrieval – Approach 1 passage extraction ranking …

  19. 1 passage extraction ranking 2 3 … n Retrieval – Approach 2

  20. fixed- length passages 1 1 ranking 2 HMM 2 3 3 … … n n our focus f0: 120-word passages, relevance feedback b0: whole-document ranking, pseudo-feedback f1: HMM-extended 60-word passages, relevance feedback Retrieval – Our Approach

  21. Passage-Level Results • Overall, baseline was the best.

  22. Effectiveness of HMM method HMM method improved performance over fixed-length passages Less improvement if fixed-length closer to optimal length

  23. Diagnosis Runs KL-divergence works poorly on passages non-optimal parameter setting HMM improves boundaries

  24. Discussions and Conclusions • HMM method improved the performance over fixed-length passages • LM (KL-divergence) method gives worse performance on passage ranking than on document ranking

  25. The End • Questions?

More Related