1 / 10

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

CS224N 2008 Tague Griffith, Jan Pfeifer. Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements. Web Search Refinements. Problem. Redundant refinements in a limited space Technical senses dominate others: Java island vs Java programming language

rianne
Download Presentation

Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS224N 2008 Tague Griffith, Jan Pfeifer Applying Diversity Metrics to Improve the Selection of Web Search Term Refinements

  2. Web Search Refinements

  3. Problem Redundant refinements in a limited space Technical senses dominate others: Java island vs Java programming language Amazon river/rain forest vs Amazon the company What happens with too much diversity Amazon grill houston Embraer ERJ 145 Amazon

  4. CBC Word Sense Similarity Similarity of terms measured by feature vectors Features are a combination of co-occurring words with their syntactic context “wine”: [“sip _”+“Verb-Object”, ...] Data from Wikipedia corpus Problems: Little overlap between web data and Wikipedia data Hyponym siblings too similar, but good refinements “planet jupiter” and “planet earth”

  5. Web Semantic Similarity Similarity as a function of web search engines results Maximum Marginal Relevance greedy algorithm MMR=argmax_x { (1-a)popularity(x) + (a)diversity(x) } x = candidate refinement popularity(x) given by recent search logs diversity(x) given by overlapping search results Clustering of terms demonstrates validity

  6. Tools: demo http://abstract.homelinux.org:9240/janpf/fp/diversity_demo.php?term=target

  7. Tools: demo

  8. Tools: demo

  9. AB Editorial Test 0.0, 0.3 and 0.8 diversity Evaluate utility of refinements Scale: definitely better, slightly better, same 17 editors Mixed results, with high variability

  10. Results Problems with increased diversity: Editor penalized long refinements Spam and adult terms have “artificial” diversity in web semantic More mixed language results Esoteric refinements Refinement selection should include: Popularity feature Diversity feature Length feature Category classification feature (spam, adult, etc.)‏

More Related