230 likes | 433 Views
Contextual IR. Naama Kraus. Slides are based on the papers: Searching with Context, Kraft, Chang, Maghoul , Kumar Context-Sensitive Query Auto-Completion, Bar- Yossef and Kraus. The Problem (recap). User queries are an imperfect description of their information needs Examples:.
E N D
Contextual IR Naama Kraus Slides are based on the papers: Searching with Context, Kraft, Chang, Maghoul, Kumar Context-Sensitive Query Auto-Completion, Bar-Yossef and Kraus
The Problem (recap) • User queries are an imperfect description of their information needs • Examples: Ambiguous queries: jaguar General queries: haifa Terminology differences (synonyms) between user and corpus stars - planets
Contextual IR • Leverage context to better understand the user’s information need • Context types • Short-term context • Current time and location, recent queries, recent page visits, current page viewed, recent tweets, recent e-mails … • Long-term context (user profile/model) • Long-term search history, user interests, user demographics (gender, education…), emails, desktop files… Today’s focus: short-term context
Example Document retrieval – use context to disambiguate queries jaguar recently viewed page
Searching with Context Kraft, Chang, Maghoul, Kumar, WWW’06
Searching with Context • Goal: improve document retrieval • Capture user’s recent context • Piece of text • Extract terms from a page a user is currently viewing, a file a user is currently editing … • Proposes three different methods • Query rewriting (QR) • Add terms to the user’s original query • Rank biasing (RB) • Re-rank results • Iterative filtering meta-search (IFM) • Generate sub-queries and aggregate results
Query Rewriting • Send one simple query to a standard search engine • Augment top context terms to original query • AND semantics • Parameter: how many terms to add • Query q • Context term weighted vector (a b c d e) • Terms are ranked by their weight • Q_new = (q a b) for parameter 2
Rank-Biasing • Send complex query that contains ranking instructions to the search engine • Does not change the original result set, only the ranking • <q> = <selection=cat> <optional=persian,2.0> • Selection terms – original query terms • Optional terms – context terms • boost is a function of their weight new query definition must appear terms optional terms with boost factor (influence on ranking)
Iterative Filtering Meta-Search • Intuition: “explore” different ways to express an information need • Algorithm outline • Generate sub-queries • Send to search engine • Aggregate results
Sub-query Generation • Use a query template • Example: • Query q ; context = (a, b ,c) • Sub-queries • q a , q b , q c • q a b , q b c • q a b c
Ranking and Filtering • Issue k sub-queries to standard SE • Obtain results • Challenge – how to combine, rank and filter results ? • Use rank aggregation techniques
Rank Averaging • A rank aggregation method (out of many…) • Given: k lists of top results • Assign score to each position in the list • E.g., 1 to first position, 2 to second position … • For each document, average over its scores in the k lists • The final list is constructed using the average scores
Context-Sensitive Query Auto-CompletionZ. Bar-Yossef and N. Kraus, WWW’11
Query Auto-Completion An integral part of the user’s search experience Use Cases Predict the user’s intended query Save her key strokes Assist a user to formulate her information need
Motivating Example I am attending WWW 2011 I need some information about Hyderabad Desired Current hyderabad hyderabad airport hyderabad history hyderabad maps hyderabadindia hyderabad hotels hyderabad www
MostPopular Completion MostPopular is not always good enough User queries follow a power law distribution A heavy tail of unpopular queries MostPopular is likely to mis-predict when given a small number of keystrokes
Nearest Completion hyderabad hyderabad maps hyderabad airport hyatt www 2011 hyderabad india hyundai hydroxycut hyperbola Idea: leverage recent query context Intuition: the user’s intended query is similar to her context query need a similarity measure between queries (refer to paper)
Nearest Completion: Framework online 1. Expand context query 2. Search for similar completions 3. Return top k completions offline Expand completions Index completions context Nearest Neighbors Search candidate completions Repository top k context- related completions
HybridCompletion Problem • If context queries are irrelevant to current query, NearestCompletion fails to predict user’s query. Solution • HybridCompletion: a combination of highly popular and highly context-similar completions • Completions that are both popular and context-similar get promoted • hybscore(q) = c Zsimscore(q) + (1-c) Zpopscore(q) , c [0,1] • Convex combination