250 likes | 388 Views
Passage Retrieval using HMMs. HARD 2004 University of Illinois at Urbana-Champaign Jing Jiang ChengXiang Zhai. Motivation – Variable Length Passages. Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game
E N D
Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing Jiang ChengXiang Zhai
Motivation – Variable Length Passages Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … APE20030922.0156 APE20030911.0887
Motivation – Variable Length Passages document-dependent Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … HARD-422 video game crash APE20030922.0156 APE20030911.0887
Motivation – Variable Length Passages query-dependent Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … … … … … … its mobile N-Gage game … … … … … … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … HARD-422 video game crash HARD-443 hand-held electronics APE20030922.0156 APE20030911.0887
Research Question • Passage length is • document-dependent • query-dependent How to detect variable-length passages?
Previous Work on Passage Retrieval • Structural or semantic boundary • Passage is not query-specific. • Fixed-length • Passage length is not query-specific. • Passage content may not be coherent. • Arbitrary – MultiText • Only query words are considered. • Heuristics are used to reduce search space. • HMM-based • The method is promising, but previous work didn’t fully explore its potential.
w w … w w w w … w w w w … w HMM-Based Method document
w w … w w w w … w w w w … w p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1 R B2 HMM: p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM-Based Method Q: hand-held electronics relevant passage document
w w … w w w w … w w w w … w p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1 R B2 HMM: p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM-Based Method Q: hand-held electronics relevant passage document B B … B R R R … R R R B … B
B1 R B2 Constructing the HMM
B1 R B2 E Constructing the HMM end-of-doc state
B1 Q B2 E B3 Constructing the HMM end-of-doc state 0.005 0.01 smoothing achieved by transitions 0.99
B1 FB B2 E B3 Constructing the HMM end-of-doc state 0.005 0.01 expanded query LM to incorporate feedback smoothing achieved by transitions 0.99
B1 FB B2 E B3 Constructing the HMM transition probabilities trained for each document end-of-doc state 0.005 0.01 expanded query LM to incorporate feedback smoothing achieved by transitions 0.99
true passage w w … … w w w w w w w w w w … … w w w w w w w w w w … … w w short passage with artificial boundary B1 FB B2 E B3 w … w w w w w w … w w w w w … w passage extended to the natural topical boundary Passage Extension
1 2 3 … n Retrieval – Approach 1 ranking
1 2 3 … n Retrieval – Approach 1 passage extraction ranking …
1 passage extraction ranking 2 3 … n Retrieval – Approach 2
fixed- length passages 1 1 ranking 2 HMM 2 3 3 … … n n our focus f0: 120-word passages, relevance feedback b0: whole-document ranking, pseudo-feedback f1: HMM-extended 60-word passages, relevance feedback Retrieval – Our Approach
Passage-Level Results • Overall, baseline was the best.
Effectiveness of HMM method HMM method improved performance over fixed-length passages Less improvement if fixed-length closer to optimal length
Diagnosis Runs KL-divergence works poorly on passages non-optimal parameter setting HMM improves boundaries
Discussions and Conclusions • HMM method improved the performance over fixed-length passages • LM (KL-divergence) method gives worse performance on passage ranking than on document ranking
The End • Questions?