1 / 37

Redeeming Relevance for Subject Search in Citation Indexes

Redeeming Relevance for Subject Search in Citation Indexes. Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu. Citation Indexes. Valuable tools for research Examples: SCI, CiteSeer, arXiv, CiteBase Permit traversal of citation networks Identify significant contributions

anne
Download Presentation

Redeeming Relevance for Subject Search in Citation Indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu

  2. Citation Indexes • Valuable tools for research • Examples: SCI, CiteSeer, arXiv, CiteBase • Permit traversal of citation networks • Identify significant contributions • Subject search is often the entry point

  3. Subject search • Query similarity • Citation frequency

  4. Citation frequency • PageRank • Example: 2 papers • similar in terms of relevance • published at roughly the same time • Paper A cited only by its author • Paper B cited 10 times by other authors • Paper B likely to have greater priority for reading

  5. Problem • Boolean retrieval metrics • Many top documents are not relevant • Effective for Web-searches • Any one of several popular pages will do • Not so for users of citation indexes

  6. Reference Directed Indexing (RDI) • Objective: To combine strong measures of both relevance and significance in a single metric • Intuition: The opinions of authors who cite a document effectively distinguish both what a document is about and how important a contribution it makes • Similar to the use of anchor text to index Web documents

  7. Example • Paper by Ron Azuma and Gary Bishop • On tracking the heads of users in augmented reality systems • Head tracking is necessary in order to generate the correct perspective view

  8. Azuma et al. [2] developed a 6DOF tracking system using linear accelerometers and rate gyroscopes to improve the dynamic registration of an optical beacon ceiling tracker. A single reference to Azuma

  9. Summarizes Azuma paper as… • A six degrees of freedom tracking system • With additional details: • Improves dynamic registration • Optical beacon ceiling tracker • Linear accelerometers • Rate gyroscopes

  10. Leveraging multiple citations • For any document cited more than once… • We can compare the words of all authors • Terms used by many referrers make good index terms for a document

  11. Azuma et al. [2] developed a 6DOF tracking system using linear accelerometers Azuma and Holloway analyze sources of registration and tracking errors in AR systems [2, 11, 12]. Whereas several augmented reality environments are known (cf. State et al. 1] Azuma and Bishop [3]) … e.g. landmark tracking for determining head pose in augmented reality [2, 3, 4, 5] Repeated use of “tracking” and “augmented reality”

  12. A voting technique • RDI treats each citing document as a voter • The presence of a query term in referential text is a vote of “yes” • The absence of that term, a “no” • The documents with the most votes for the query terms rank highest

  13. Related Work • McBryan – World Wide Web Worm • Brin & Page – Google • Chakrabarti et. al - CLEVER • Mendelzon et. al - TOPIC • Bharat et. al – Hilltop • Craswell et. al – Effective Site Finding

  14. Contributions • Application to scientific literature • “Anchor text” for unrestricted subject search • “Anchor text” for combining measures of relevance and significance

  15. Rosetta • Experimental system in which we implemented RDI • Term weighting metric: • Ranking metric:

  16. Experiments • 10,000 research papers • Gathered from CiteSeer • Each document cited at least once • Evaluated • Retrieval precision • Impact of search results

  17. Comparison system • We compared Rosetta to a traditional content-based retrieval system • Comparison system uses TFIDF for term weighting: • And the Cosine ranking metric:

  18. Indexing • Indexed collection in both Rosetta and the TFIDF/Cosine system • Rosetta indexed documents based on references to them • The TFIDF/Cosine system indexed documents based on words used within them • Required that each document was cited at least once to ensure that both systems indexed the same set of documents

  19. As referential text, Rosetta used CiteSeer’s “contexts of citation”

  20. As referential text, Rosetta used CiteSeer’s “contexts of citation”

  21. Queries • 32 queries in our test set • Queries were key terms extracted from “Keywords” sections of documents • Queries extracted from sample of 24 documents • Document from which key term was extracted established the topic of interest

  22. Queries

  23. Relevance assessments • The topic of interest for a query was the idea identified by the corresponding key term • Relevant documents directly addressed this same topic • Example: • Query: “force feedback” • Relevant: Work on providing a sense of touch in VR applications or other computer simulations

  24. Retrieval interface • Meta-interface • Queried both systems • Used top 10 search results from each system • Integrated all 20 search results • Presented them in random order • No way to determine the source of a retrieved document

  25. Experimental summary • 32 queries drawn from document key terms • Document identified the topic of interest • Relevant documents addressed the same topic • Used a meta-search interface • Evaluated top 10 from both systems • Origin of search results hidden

  26. Precision at top 10 • On average RDI provided a 16.6% improvement over TFIDF/Cosine • 1 or 2 more relevant documents in the top 10 • Result is significant • t-test of the mean paired difference • Test statistic = 3.227 • Significant at a confidence level of 99.5%

  27. Precision at top 10 (cont’d)

  28. Many retrieval errors avoided • Example: software architecture diagrams • Most papers about software architecture frequently use the term “diagrams” • Few are about tools for diagramming • TFIDF/Cosine system -- 0/10 relevant • Rosetta -- 4/10 relevant (3 in top 5) • Rosetta made the correct distinction more often

  29. Rosetta Shortcomings • Retrieval metric sorts search results by number of query terms matched • Some authors reuse portions of text in which other documents are cited

  30. Impact of search results • A look at the number of citations to documents retrieved for each query • Compared RDI to a baseline provided by the TFIDF/Cosine system • TFIDF/Cosine includes no measure of impact • Seeking only a measure of the relative impact of documents retrieved by RDI on a given topic

  31. Experiment • For each query… • Calculated the average citations/year for each document • Average publication year for Rosetta – 1994 • TFIDF/Cosine – 1995 • Found the median number of citations/year for each set of search results • Found the difference between the median for Rosetta and the median for TFIDF/Cosine

  32. Difference in impact • On average the median citations/year… • 8.9 for Rosetta • 1.5 for the baseline

  33. Difference in impact (cont’d)

  34. Summary of Experiments • Small study – results are tentative • Surpassed retrieval precision of a widely used relevance-based approach • Consistently retrieved documents that have had a significant impact

  35. Future Work • Retrieval metric that eliminates Boolean component • Large scale implementation with CiteSeer data • Studies with more sophisticated relevance-based retrieval systems • Comparison with popularity-based retrieval techniques

  36. Contact Shannon Bradshaw The University of Iowa shannon-bradshaw@uiowa.edu www.biz.uiowa.edu/sbradshaw

More Related