130 likes | 228 Views
Introduction. Query operations attempt to reformulate the original query in order to enhance the efficacy of IR Query operations typically involve Expanding the original query Reweighing the terms in the expanded query
E N D
Introduction • Query operations attempt to reformulate the original query in order to enhance the efficacy of IR • Query operations typically involve • Expanding the original query • Reweighing the terms in the expanded query • Using feedback from user actions and/or analysis of local/global document set
Query OperationsPaper 1: http://doi.acm.org/10.1145/243199.243202Paper 2:http://doi.acm.org/10.1145/511446.511489 Janak Mathuria
Motivation • Word mismatch • Vocabularies of users and authors may vary • Users often use different words to retrieve information than the authors to express the concepts • Short Queries • Users typically submit very short queries to retrieve information; average query size on the web is less than two words • The keywords in the query alone may not suffice to retrieve relevant information
Traditional Approaches • Relevance feedback • Perhaps the most popular • Expansion and term reweighing based on user selecting certain retrieved documents as relevant • Easy to understand; provides controlled process to emphasize or de-emphasize terms • Automatic local analysis • Documents retrieved for the original query are used to expand the query • Thus all retrieved documents are considered relevant • Requires access to the document contents and not just the term indices
Traditional Approaches • Automatic global analysis • Use global context of a concept to determine similarities between concepts • Concepts can be noun groups (up to ‘n’ adjacent nouns) • Context can be limited to within a certain vicinity of the concept or the entire document • A ranked list of phrasal concepts is generated for a query • A certain number of the top-ranked phrasal concepts are added to expand the query
Local Context Analysis • Proposed by Xu and Croft in the paper Query Expansion Using Local and Global Document Analysis (1996) • Combines global and local analysis as follows • Considers passages rather than entire documents since documents may be very long and may contain multiple concepts that may not be related • Standard IR system is used to retrieve a certain number of top-ranked passages • Ranks each concept in the top-ranked passages in terms of similarity with the query using a form of tf-idf ranking • Uses a certain number of top-ranked concepts to expand the query • Uses context (in the form of passages) and concepts as in global analysis • Uses top-ranked passages for query expansion as in local analysis
LCA Results • Experiments indicate that in most cases LCA performs better than both global and local analysis • Improvement using global analysis were in the range of 5%-10% • Improvement using local analysis were in the range of 15%-20% • Improvement using LCA were in the range of 20%-25% • Certain concepts may be filtered out by global analysis as they may occur very frequently in the corpus though they may not occur frequently in the local document set • Surprisingly very little overlap was found between the expansion terms using local analysis and LCA
Expansion using Query Logs • Proposed by Cui, Wen, Nie and Ma in the paper Probabilistic Query Expansion Using Query Logs (2002) • Query logs contain information about what queries were posed and what documents from the resulting set were selected (as in “clicked”) by the user • QL is based on the assumption that • Documents selected by the user were relevant to the query • Terms in these documents are strongly related to the terms in the query • Computes the co-relation between the query terms and document terms • Co-relations are calculated for concepts as well as words • Uses the co-relations to expand queries
Expansion Using Query Logs • Exhibits the following important properties • Term co-relations can be computed offline • Term co-relations reflect the preference of most users • Term co-relations evolve along with the accumulation of query logs • Results from the experiments conducted by the authors indicate that • Improvements using LCA were in the range of 20%-25%; this is consistent with the finding by Xu and Croft • Improvements using QL were in the range of 70%-75%
Similarities • Both attempt to improve retrieval through query expansion • Both are variations of relevance feedback and can be fully automated • LCA automates relevance feedback by not taking user selection into account at all; it considers all retrieved passages as relevant • QL automates relevance feedback by taking into account historical user selection to determine relevant documents • Both approaches make use of noun phrases as concepts • QL uses word as well as concept co-relations • LCA uses only concept co-relations
Differences • QL uses information outside the corpus for query expansion • QL is adaptive • If the corpus remains static, LCA will yield the same results; QL will adapt to yield different results • QL captures changes in co-relations over time more precisely • QL requires minimal computations at run-time and does not require access to documents • LCA requires computations as well as access to documents at run-time • QL approach does not use context for query expansion
Critique • The results indicate that QL compares favorably to LCA but • The comparisons appear to be based on queries that have large amounts of query logs available • LCA will perform better than QL when adequate query logs are not available • In QL, more commonly used co-relations may cannibalize less commonly used co-relations • This may not be material for commercial IR systems but may be material from a pure IR point of view • The assumption that selected documents are relevant may not always hold • A distinction should probably be made between documents selected as in clicked vs. documents selected and read
Conclusions • By making use of information outside the corpus, QL does improve over traditional automated query expansion approaches • QL appears to be easier to understand as well as easier to implement • In the same way that LCA borrows concepts from global analysis to extend local feedback, it may be worthwhile to extend QL by borrowing the following concepts from global analysis • Use context in calculating word and concept co-relations • Instead of relying entirely on the availability adequate query logs, use global analysis to obtain initial concept co-relations and then refine these co-relations as more query logs are available