1 / 20

Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow IBM Haifa Research Labs SIGIR 2005

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval. Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow IBM Haifa Research Labs SIGIR 2005. Abstract.

Download Presentation

Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow IBM Haifa Research Labs SIGIR 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Estimate Query DifficultyIncluding Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow IBM Haifa Research Labs SIGIR 2005

  2. Abstract • Novel learning methods are used for estimating the quality of results returned by a search engine in response to a query. • Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. • Quality estimation are useful for several applications, including improvement of retrieval, detecting queries for whichno relevant content exists in the document collection, and distributed information retrieval.

  3. Introduction (1/2) • Many IR systems suffer from a radical variance in performance. • Estimating query difficulty is an attempt to quantify the quality of results returned by a given system for the query. • Reasons for query difficulty estimation • Feedback to the user • The user can rephrase “difficult” queries. • Feedback to the search engine • To invoke alternative strategies for different queries • Feedback to the system administrator • To identify queries related to a specific subject, and expand the document collection. • For distributed information retrieval

  4. Introduction (2/2) • The observation and motivation: • queries answered well are those whose query terms agree on most of the returned documents. • Agreement is measured by the overlap between the top results. • Difficult queries are those: • The query terms cannot agree on top results. • Most of the terms do agree except a few outliers (局外人). • A TREC query for example: “What impact has the chunnel (水底隧道) had on the British economy and/or the life style of the British”

  5. Related Work (1/2) • In the Robusttrackof TREC 2004, systems are asked to rank the topics by predicted difficulty. • The goal is eventually to use such predictions to do topic-specific processing. • Prediction methods suggested by the participants: • Measuring clarity based on the system’s score of the top results • Analyzing the ambiguity of the query terms • Learning a predictor using old TREC topics as training data • (Ounis, 2004) showed that IDF-based predictor is positively related to query precision. • (Diaz, 2004) used temporal distribution together with content of the documents to improve the prediction of AP for a query.

  6. Related Work (2/2) • The Reliable Information Access (RIA) workshop investigated the reasons for system performance variance across queries. • 10 failure categories were identified. • 4 of which are due to emphasizing only partial aspects of the query. • One of the conclusions of this workshop: “…comparing a full topic ranking against ranking based on only one aspect of the topic will give a measure of the importance of that aspect to the retrieved set”

  7. Estimating Query Difficulty • Query terms are defined as the keywords and the lexical affinities. • Features used for learning: • The overlap between each sub-query and the full query • Measured by κ-statistics • The rounded logarithm of the document frequency, log(DF), of each of the sub-queries. • Two challenges for learning: • The number of sub-queries is not constant. • A canonic representation is needed. • The sub-queries are not ordered.

  8. Query Estimator Using a Histogram (1/2) • The basic procedure: • Find the top N results for the full query and for each sub-query. • Build a histogram of the overlaps h(i,j) to form a feature vector. • Values of log(DF) are split into 3 discrete values {0-1, 2-3, 4+}. • h(i,j) means log(DF)=i & overlaps=j. • The rows of h(i,j) are concatenated as a feature vector. • Compute the linear weight vector c for prediction. • An example, suppose a query has 4 sub-queries: log(DF(n))=[0 1 1 2], overlap=[2 0 0 1] → h(i)=[0 0 1 2 0 0 0 1 0]

  9. Query Estimator Using a Histogram (2/2) • Two additional features • The score of the top-ranked document • The number of words in the query • Estimate the linear weight vector c (Moore-Penrose pseudo-inverse): c = (H.HT)-1.H.tT H=the matrix with columns are feature vectors of training queries t=a vector of the target measure (P@10 or MAP) of training queries (H and t can be modified according to the objective)

  10. Query Estimator Using a Modified Decision Tree (1/2) • Useful for sparseness, i.e. queries are too short. • A binary decision tree • Pairs of overlap and log(DF) of sub-queries form features. • Each node consists of a weight vector, threshold, and score. • An example:

  11. Query Estimator Using a Modified Decision Tree (2/2) • The concept of Random Forest • Better decision trees can be obtained by training a multitude of trees, each in a slightly different manner or using different data. • Apply AdaBoost algo. to resample training data

  12. Experiment and Evaluation (1/2) • The IR system is Juru. • Two document collections • TREC-8: 528,155 documents, 200 topics • WT10G: 1,692,096 documents, 100 topics • Four-fold cross-validation, • Measured by Kendall’s-τcoefficient

  13. Experiment and Evaluation (2/2) • Compared with some other algorithms • Estimation based on the score of the top result • Estimation based on the average score of the top ten results • Estimation based on the standard deviation of IDF values of query terms • Estimation based on learning a SVM for regression

  14. Application 1: Improving IR Using Query Estimation (1/2) • Selective automatic query expansion • Adding terms to the query based on frequently appearing terms in the top retrieved documents • Only works for easy queries • Using the same features to train a SVM classifier • Deciding which part of the topic should be used • TREC topics contain two parts: short title and longer description • Some topics that are not answered well by the description part are better answered by the title part. • Difficult topics use title part and easy topics use description.

  15. Application 1: Improving IR Using Query Estimation (2/2)

  16. Application 2: Detecting Missing Content (1/2) • Missing content queries (MCQs) are those have no relevant document in the collection. • Experiment method • 166 MCQs are created artificially from 400 TREC queries • 200 TREC topics consist of title and description. • Ten-fold cross-validation • A tree-based classifier is trained to classify MCQs from non-MCQs. • A query difficulty estimator may or may not be used as a pre-filter of easy queries before the MCQ classifier.

  17. Application 2: Detecting Missing Content (2/2)

  18. Application 3: Merging the Results of Distributed Retrieval (1/2) • It is difficult to rerank the documents from different datasets since the scores are local for each specific dataset. • CORI (W. Croft, 1995) is one of the state-of-the-art algorithm for distributed retrieval, using inference network to do collection ranking. • Apply the estimator to this problem: • A query estimator is trained for each dataset. • The estimated difficulty is used for weighting the scores. • These weighted scores are merged to built the final ranking. • Ten-fold cross-validation • Only minimal information is supplied by the search engine.

  19. Application 3: Merging the Results of Distributed Retrieval (2/2) • Selective weighting • All queries are clustered (2-means) based on their estimations for each of the datasets. • In one cluster, the variance of the estimations is small → unweighted scores are better for queries in this cluster. • The estimations of difficulty become noise when there is little variance.

  20. Conclusions and Future Work • Two methods for learning an estimator of query difficulty are described. • The learned estimator predicts the expected precision of the query by analyzing the overlap between the results of the full query and the results of its sub-queries. • We show that such an estimator can be used to several applications. • Our results show that the quality of query prediction strongly depends on the query length. • One of the future work is to look for additional features not depend on the query length. • Whether more training data can be accumulated in automatic or semi-automatic manner is left for future research.

More Related