Tuning Before Feedback: Combining Ranking Discovery and Blind Feedback for Robust Retrieval*

Tuning Before Feedback:Combining Ranking Discovery andBlind Feedback for Robust Retrieval* Weiguo Fan, Ming Luo, Li Wang, Wensi Xi, and Edward A. Fox Digital Library Research Laboratory, Virginia Tech *This research is supported by the National Science Foundation under Grant Numbers IIS-0325579, DUE-0136690 and DUE-0333531

Outline • Introduction • Research Questions • Approach: Ranking Tuning + Blind Fdbk • Experiment • Results • Conclusion

Introduction • Ranking functions play an important role in IR performance • Blind feedback (pseudo-relevance feedback) has been found very useful for ad hoc retrieval • Why not combine ranking function optimization with blind feedback to improve robustness?

Research Questions • Does blind feedback work even better on fine-tuned ranking functions as compared to on traditional ranking functions such as Okapi BM25? • Does the type of query (very short vs. very long) have any impact on the combination approach? • Can the ranking function discovered, in combination with blind feedback, extrapolate well for new unseen queries?

Our Approach • Use ARRANGER • a Genetic Programming-based discovery engine • to perform the ranking function tuning • [Fan 2003tkde, Fan 2004ip&m, Fan 2004jasist] • Combine ranking tuning and feedback • Test on different types of queries

Input Output Ranking Function Discovery Ranking Function f Feedback Training Data RF Discovery Problem

Ranking Function Optimization • Ranking Function Tuning is an art! – Paul Kantor • Why not adaptively discover RF by Genetic Programming? • Huge search space • Discrete objective function • Modeling advantage • What is GP? • Problem solving system designed based on principles of evolution and heredity • Widely used for structure discovery, functional form discovery, other data mining and optimization tasks

Genetic Algorithms/Programming • Representation: • Vector of bit strings or real numbers for GA • Complex data structures: trees, arrays for GP • Genetic transformation • Reproduction • Crossover • Mutation • IR application • [Gordon’88, ’91], [Chen’98a, ’98b], [Pathak’00], etc.

Essential GP Components

+ * Parent2 Parent1 N/df+df df tf*(tf+df) / / Generation: N tf + df df N N Crossover tf df Generation: N+1 + * Child2 Child1 tf*(N/df) (tf*df)+df df tf + tf df Example of Crossover in GP

Start Initialize Population 50 48 49 1 2 3 Evaluate Fitness 50 48 49 0.4 Apply Crossover 0.8 0.4 0.4 0.3 0.3 1 2 3 Stop? Validate and Output End The ARRANGER Engine • Split the training data into training and validation • Generate an initial population of random “ranking functions” • Evaluate the fitness of each “ranking function” in the population and record 10 best ones • If stopping criteria is not met, generate the next generation of population by genetic transformation, go to Step 3. • Validate the recorded best “ranking functions” and select the best one as the RF

The ARRANGER Engine • Split the training data into training and validation • Generate an initial population of random “ranking functions” • Evaluate the fitness of each “ranking function” in the population and record 10 best ones • If stopping criteria is not met, generate the next generation of population by genetic transformation, go to Step 3. • Validate the recorded best “ranking functions” and select the best one as the RF

Start Initialize Population 50 48 49 1 2 3 Evaluate Fitness 50 48 49 0.4 Apply Crossover 0.8 0.4 0.4 0.3 0.3 1 2 3 Stop? Validate and Output End The ARRANGER Engine

Blind Feedback • Automatically adds more terms to a user’s query to enhance the performance of search engines by assuming top ranked docs relevant • Some examples • Rocchio (performs best in our experiment) • Dec-Hi • Kullback-Leibler Divergence (KLD) • Chi-Square

An Integrated Model

Experiment Setting • Data • 2003 Robust Track data (from TREC 6, 7, 8) • Training Queries • 150 old queries from TREC 6, 7, 8 • Test Questions • 50 very hard queries + 50 new queries

The Results on 150 Training Queries

Results on Test Queries (1)

Results on Test Queries (2)

Conclusions • Blind feedback works well on GP trained queries. • Ranking function combined with blind feedback works with new queries • Two stage model responds differently to Desc query (slightly better) and Long query

Thank You! Q&A?

Tuning Before Feedback: Combining Ranking Discovery and Blind Feedback for Robust Retrieval*