190 likes | 294 Views
Mining Dependency Relations for Query Expansion in Passage Retrieval. Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006. Introduction. Query expansion (QE) is a method for improving the effectiveness of IR
E N D
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006
Introduction • Query expansion (QE) is a method for improving the effectiveness of IR • by providing additional contextual information to the original queries • Traditional passage retrieval algorithms perform a density based weighting of query terms • prefer passages containing query terms that are close together
Introduction • Local Context Analysis (LCA) [Croft, 1996] • A common QE technique based on term co-occurrence statistics • utilizes only statistical information instead of semantic information • unable to differentiate between noisy and good quality expansion terms
Introduction • [Cui et al., 2005] • The use of a fuzzy dependency relation matching method for passage retrieval • significant improvement in MRR over the density based passage retrieval systems • This work points towards the importance of performing syntactical analysis • The longer queries benefit more from this method • Query expansion is needed for short queries
Introduction • The main contribution of this paper is employing a relation based model to perform: • contextual term selection to enhance density based passage retrieval • relation extractionto enhance the fuzzy dependency relation matching approach • To make the expansion process more robust, it extracts relations and terms from external corpus (web).
Query Expansion Based on Dependency Relation Fig. Framework of Relation Based Query Expansion
Dependency Relation Paths from Web Snippets • The Web is considered as a parallel corpus: • Send the queries to Google and collect the top k snippets • Each sentence is considered as a passage, and each snippet contains 2 sentences on average (k=100, similar to LCA [Croft, 1996]) • Use Minipar, a dependency grammar parser, to parse the passages.
Examples of Parse Tree Fig. The parse trees of the sample question and sentence, <When, wha, head, purchased> is a relation path. The directions of relations are ignored in experiments.
Term Expansion for Density Based Retrieval System (1/2) • Ranking candidate expanded terms • A variant formula of that in LCA • Global importance • IDF of the expanded term • Local importance • The relation path linking to the query term • Adding the top k terms to the original query with weight (1-0.9*i /k)
Term Expansion for Density Based Retrieval System (2/2) where Tk = the term to be ranked; idfTk=max(1.0, log10(N/NTk)); idfti=max(1.0, log10(N/Nti)); pj = the jth passage in the passage set P; score(Reli) = the score of individual relation which is obtained through training δ is set to 0.1 to avoid zero values
Relation Based Retrieval Method (RBM) • RBM is used to perform passage re-ranking based on the initial retrieval result obtained by the density based method (DBM). • The similarity between passage S and Q is computed by finding all possible relation path pairs (PS, PQ) from S and Q that have the same starting and ending nodes. • The translation probability Prob(PS|PQ) is the sum over all possible alignments:
Relation Path Expansion • A technique to be used on top of the fuzzy relation based retrieval [Cui, 2005] • The path expansion technique extracts additional relation paths linking the expanded terms with original query terms. • Select the path associated with Tk that has the maximum path_score(Tk,t,j) to be expanded, weighted by (1-0.9*i /k)
Model Training • Retrieve the top 100 snippets from Google for each Qi . • A path <Start_Node, Rel1, …, Relm, End_Node> in the snippets is “relevant” if • The relevant paths are those inferring a useful term to the question. • Employ unigram language model to train the weight of each relation:
Evaluations • The evaluations aim to verify three hypotheses • It’s effective to incorporate dependency relation based query expansion technique to select high quality terms in a density based method. • The use of dependency relation based query expansion technique to extract relation paths further improves the precision of passage ranking when integrated with fuzzy relation matching method. • As short queries with fewer key terms are likely to have word mismatch problems
Experiment Setup • Training data • 10,255 factoid QA pairs from TREC-8 and TREC-9 QA tasks • The top 100 snippets from Google for each question • 8,892 relevant paths extracted • Testing data • The AQUAINT news corpus • 324 factoid questions in TREC-12 QA task • Excluding 30 questions with NIL answers and 59 questions that do not have any ground truth passages • 5 Comparison systems • DBS, DBS+LCA, DBS+DRQET, RBS, RBS+DRQER
Experiment Result-1 Table 1. Overall performance comparison. All improvements are significant.
Experiment Result-2 Fig. MRR before and after query expansion vs. number of non-trivial question terms.
Experiment Result-3 • Testing dataset 2: 356 short queries in TREC-11 and TREC-12 QA tasks • The improvement is more significant than that in table 1. • DBS+DRQET performs better than RBS.
Conclusion and Future Work • Two dependency relation based query expansion techniques, DRQET and DRQER, are presented. • The experimental results show that RBS+DRQER performs best among the 5 systems. • We also studied the relationship between query lengths and improvements by query expansion. • Directions for future work: (1) explore the use of different models and their combinations for relation based query expansion; (2) conduct detailed analysis on the performance of RBS+DRQER on different types of queries.