Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

CS224N 2008 Tague Griffith, Jan Pfeifer Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

Web Search Refinements

Problem Redundant refinements in a limited space Technical senses dominate others: Java island vs Java programming language Amazon river/rain forest vs Amazon the company What happens with too much diversity Amazon grill houston Embraer ERJ 145 Amazon

CBC Word Sense Similarity Similarity of terms measured by feature vectors Features are a combination of co-occurring words with their syntactic context “wine”: [“sip _”+“Verb-Object”, ...] Data from Wikipedia corpus Problems: Little overlap between web data and Wikipedia data Hyponym siblings too similar, but good refinements “planet jupiter” and “planet earth”

Web Semantic Similarity Similarity as a function of web search engines results Maximum Marginal Relevance greedy algorithm MMR=argmax_x { (1-a)popularity(x) + (a)diversity(x) } x = candidate refinement popularity(x) given by recent search logs diversity(x) given by overlapping search results Clustering of terms demonstrates validity

Tools: demo http://abstract.homelinux.org:9240/janpf/fp/diversity_demo.php?term=target

Tools: demo

AB Editorial Test 0.0, 0.3 and 0.8 diversity Evaluate utility of refinements Scale: definitely better, slightly better, same 17 editors Mixed results, with high variability

Results Problems with increased diversity: Editor penalized long refinements Spam and adult terms have “artificial” diversity in web semantic More mixed language results Esoteric refinements Refinement selection should include: Popularity feature Diversity feature Length feature Category classification feature (spam, adult, etc.)‏

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

Presentation Transcript

Web Metrics

Metrics to improve software process

Web Search – Summer Term 2006 IV. Web Search - Crawling (part 2)

Applying Automated Metrics to Speech Translation Dialogs

Temporal Query Log Profiling to Improve Web Search Ranking

Wikipedia as Sence Inventory to Improve Diversity in Web Search Results

Wikipedia as Sence Inventory to Improve Diversity in Web Search Results

Wikipedia as Sence Inventory to Improve Diversity in Web Search Results

Web Metrics

Web Search – Summer Term 2006 VI. Web Search - Indexing

MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

Web Search – Summer Term 2006 IV. Web Search - Crawling

FY 09 Diversity Metrics

Web Search – Summer Term 2006 VI. Web Search - Indexing

Term Selection

Selection of Measurement Techniques and Metrics

Web Search – Summer Term 2006 VI. Web Search - Ranking

Web Metrics

Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.)

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

Web Metrics