210 likes | 383 Views
CiteSight : Contextual Citation Recommendation with Differential Search. Avishay Livne 1 , Vivek Gokuladas 2 , Jaime Teevan 3 , Susan Dumais 3 , Eytan Adar 1 1 University of Michigan, 2 Qualcom, 3 Microsoft. #SIGIR18 # JaimesBackyard.
E N D
CiteSight:Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft
CiteSight:Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft
Why Do We Cite? • Paying homage to pioneers • Giving credit for related work • Identifying methodology • Providing background • Correcting one’s work • Correcting the work of others • Substantiating claims • … [Garfield, 1965]
How Do We Cite? • Many resources • Search engines • Bibliographic tools • Colleagues • Work practice • Papers we know • Papers we should know
Why × How = 2 Specs • Spec 1 • I know what I want, give it to me now • Citation context: • “… calculating the differences between blocks of text [“ • Spec 2 • I don’t know or can’t remember what I want • [cite] • Complex, dynamic search space = slow • Inherent trade-off • Can we build a system to support both?
Microsoft Academic Split World Into Two Stuff I want fast = stuff I know about Stuff I don’t know about
Strategy • Small, personalized index • Updated dynamically • What you’ve cited before • What you’ve cited now • What other people have cited • Venue, co-citation, etc. • Run a big index for everything else
Ranking • Query: Citation context • “… calculating the differences between blocks of text [“ • Dynamic recommendations • Immediately: Search the cache • In the background: Search the full index • Rank retrieved papers: • Gradient boosted regression tree • Features: network + text • Popularity, author similarity, textual similarity,…
Citation context is really good at picking out “winners” People talk about a paper the same way as you! Not the same way the author talks about their work Citation Context XYZ is similar to ABC […] Bob et al. introduced ABC in […] We utilize ABC to…[…] Paper text
That’s nice… Citations (S. Redner, 1998)
Context Coupling • A and B related • Co-cited: When B is mentioned, A is • “Borrow” contexts from A to B • Borrowed context used as a feature in ranking papers A B Popular paper Less-popular paper
CiteSight Evaluation • Can CiteSight predict existing citations? • 1000 randomly selected CS papers (2011) • Criteria: 20-40 citations • 5-fold cross validation • Metric: NDCG • Gain of 1 when guesses correct citation • Gain related to # of co-citations for close guesses • User feedback from 5 CS grad students
Results • Large improvement • Context coupling • All features
Results • Large improvement • Context coupling • All features • Citation-related features > text • More info = better • Authors • Citations, to a point
Cache v. Corpus • Relevance • Cache accounts for 46% of NDCG@10 of the corpus • 10% cache is better • Speed • Cache: 6 ms • Instantaneous! • Corpus: 450 ms
Summary • Differential need for speed • CiteSight – differential search • Two different use cases = two indices • Local index updated dynamically, contextually • Global index with full content • Context coupling improves relevance • Local index improves speed • Able to provide instantaneous results • Often relevant because contextually updated