1 / 13

Mining Anchor Text for Query Refinement Reiner Kraft and Jason Zien IBM Almaden Research Center

Mining Anchor Text for Query Refinement Reiner Kraft and Jason Zien IBM Almaden Research Center. Mark Strohmaier. Problem Motivation. 23% of search queries are single-term Expanding the query can lead to more accurate searches

beryl
Download Presentation

Mining Anchor Text for Query Refinement Reiner Kraft and Jason Zien IBM Almaden Research Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Anchor Text for Query Refinement Reiner Kraft and Jason Zien IBM Almaden Research Center Mark Strohmaier

  2. Problem Motivation • 23% of search queries are single-term • Expanding the query can lead to more accurate searches • Previous studies indicate that anchor text is statistically similar to search queries • Can this similarity be exploited to improve search queries?

  3. What is anchor text? • <a href=”this is the website”> This is the anchor text </a> • Destination pages can have multiple links pointing to them • Collections of anchor text can give a view of the destination page • Naïve approach: • Find links whose anchor text is similar to the query • Return the links destination pages to the user

  4. Problems with naïve approach • High term frequency is not directly related to page quality • Repeated terms may lead to unnatural queries • IDF is not necessarily relevant • Anchor text may appear multiple times

  5. Methods of Query Refinement • Weighting the number of occurrences • Weight based on the type of anchor text • Number of terms in the anchor text • Smaller terms is better • Number of characters in the anchor text • More concise queries are better

  6. Benefits of the Anchor Text • There is much less anchor text than document text • Pages can have many incoming links • Refined anchor text can capture a degree of site popularity

  7. Mining Anchor Text • Initial web crawl covered 33 million links on IBM intranet • Additionally, roughly 350,000 queries were analyzed • Both categories showed a similar relationship between length and number of occurrences

  8. Pre-processing Summaries • Query refinement is sensitive to the number of terms • Too few may not lead to much improvement • Too many may lead to overspecialization Best results were for MAXCOUNT = 3

  9. Studies Performed • Three different approaches were compared • Anchor • Ranked Anchor Text refinement • Doc.SW • This ranked pages based on the most frequently occurring 2 and 3 term phrases • DOC • Similar to Doc.SW, but not counting stop words

  10. Ranking Anchor Texts • The results are ranked based on • WCOUNT score • Number of terms in the anchor summary • Number of characters in the anchor summary

  11. Comparison of Methods • Second comparison tested 22 different queries • QUERYLOG processes and dynamically updates user queries based on previous ones, in a similar manner as ANCHOR

  12. Conclusions • Using anchor text leads to better results than performing similar methods on document collections • A similar approach can be used to refine user search queries as well

  13. Future Directions • Broadening search queries • Lexical analysis, rather than straight textual • Pre- and Post- anchor text

More Related