Personalized Query Expansion for the Web

Personalized Query Expansion for the Web P. Chirita, C. S. Firan, & W. Nejdl Published in SIGIR 07

Introduction • Web query reformulation by exploiting the user’s Personal Information Repository (PIR) • Desktop (as a PIR) is a rich repository of information about the user’s interest. • Keyword, expression, and summary based techniques are proposed.

Previous Work • Personalized Search • User profiles: • ex. User profiling based on browsing history • Requires server side storage for all personal information, raising privacy concerns. • The actual search algorithm • Build the personalization aspect directly into Page-Rank (a target set of pages)

Previous Work • Automatic Query Expansion • Exploiting various social or collection specific characteristics to generate additional terms • Relevance Feedback Techniques • TF, DF, summarization • Co-occurrence Based Techniques • Highly co-occurring terms, terms in lexical affinity relationships are added. • Thesaurus Based Techniques: WordNet • Closely related terms in meaning are added.

Expanding with Local Desktop Analysis • TF • DF • Given the set of Top-K relevant Desktop documents • Generate their snippets as focused on the original search request • Identify the set of candidate terms • Order them according to the DF scores they are associated with nrWords: the total number of terms in the documentpos: the position of the first appearance of the term

Lexical Compounds • Use simple noun analysis • Sentence Selection • Identify the set of relevant Desktop documents • Generate a summary containing their most important sentences • Treshold

PS is calculated for the first 10 sentences.

Expanding with Global Desktop Analysis

Cosine Similarity • Mutual Information • Likelihood Ratio

Thesaurus based expansion

Experiments • 4 queries were chosen • One very frequent AltaVista query • One randomly selected log query • One self-selected specific query • One self-selected ambiguous query • Collect the top-5 URL generated by 20 version of algorithms. Shuffle them. Each subject assess about 325 documents for 4 queries • Give a rating ranging from 0 to 2. • Assessed with NDCG (Normalized Discounted Cumulative Gain) • T-test was done.

Algorithm • Tested • Base line: Google • TF, DF • LC, LC(O): Lexical Compounds regular and optimized (considering one top compound) • SS: Sentence Selection • TC[CS], TC[MI], TC[LR]: Term Co-occurrence Statistics using respectively Cosine Similarity, Mutual Information, and Likelihood Ratio as similarity coefficients • WN[SYN], WN[SUB], WN[SUP]: with synonyms, sub-concepts, and super-concepts

Results

Adaptivity • Query Scope • Query Clarity the probability of the word w within the submitted query the probability of w within the entire collection of documents

Query Formulation Process • the newly added terms are more likely to convey information about her search goals • giving more weight to new keywords

Application to the Project • Collected news articles by the user can be treated as the user’s desktop. So that we can apply their algorithms to our system.

Personalized Query Expansion for the Web

Personalized Query Expansion for the Web

Presentation Transcript

Lecture 8 Query Expansion

Relevance Feedback and Query Expansion

Query expansion techniques

Similarity Measures for Query Expansion in TopX

Personalized Query Classification

Automatic Term Mismatch Diagnosis for Selective Query Expansion

Personalized Query Expansion for the Web

PQC: Personalized Query Classification

THE eRA WEB QUERY TOOL

Query Expansion

Information Retrieval - Query expansion

Personalized Web Interaction

Query Expansion

Information Retrieval - Query expansion

Query Expansion

Automatic Query Expansion in Information Retrieval

Similarity Measures for Query Expansion in TopX

Query Expansion

Lecture 10: Query expansion

Personalized Web Interaction

Information Retrieval - Query expansion

Lecture 9: Query Expansion