1 / 38

İ rem Ar ıkan , Srikanta Bedathur, Klaus Berberich

Time Will Tell: Leveraging Temporal Expressions in IR. İ rem Ar ıkan , Srikanta Bedathur, Klaus Berberich. Motivation. Documents contain temporal information in the form of temporal expressions. Motivation. Documents contain temporal information in the form of temporal expressions.

meg
Download Presentation

İ rem Ar ıkan , Srikanta Bedathur, Klaus Berberich

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Will Tell: Leveraging Temporal Expressions in IR İrem Arıkan, Srikanta Bedathur, Klaus Berberich

  2. Motivation • Documents contain temporal information in the form of temporal expressions

  3. Motivation • Documents contain temporal information in the form of temporal expressions

  4. Motivation • Users have temporal information needs • Query: Prime Minister United Kingdom2000

  5. Motivation • Users have temporal information needs • Query: Prime Minister United Kingdom2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents Temporal expressions are more than common terms

  6. Motivation • Users have temporal information needs • Query: Prime Minister United Kingdom2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents Temporal expressions are more than common terms OUR APPROACH Integratestemporal dimensioninto a language model basedretrievalframework

  7. Outline Motivation Model Our Approach Experimental Evaluation

  8. Document Model • Documentd = { dtext,dtemp} • dtext: a bag of textual terms • dtemp: a bag of temporal expressions

  9. Document Model • Documentd = { dtext,dtemp} • dtext: a bag of textual terms • dtemp: a bag of temporal expressions • a temporal expression is considered as a time interval T = [begin,end ] T [ ] 0 begin end

  10. Query Model • Query q= { qtext,qtemp} • qtext: set of textual terms • qtemp: set of temporal expressions • Prime Minister United Kingdom 2000 qtext qtemp

  11. Outline • Motivation • Model • Our Approach • Filtering Approach • Weighted Approach • Experimental Evaluation

  12. Our Baseline: Ponte and Croft‘s Model (LM) • Each document has a language model associated • Query is a random process • Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document

  13. Filtering Approach (LMF) • Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query t

  14. Filtering Approach • Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query • our definition of temporal relevance • only relevant, if it overlaps with a temporal expression from the query 2 May 1997 – 27 June 2007 28 Nov 1990 - 2 May 1997 2000 query t begin end

  15. Filtering Approach • Idea: Discard all documents that do not contain any relevant temporal expressions to user‘s query • our definition of temporal relevance • only relevant, if it overlaps with a temporal expression from the query • Relevant 2 May 1997 – 27 June 2007 X Irrelevant 28 Nov 1990 - 2 May 1997 2000 query t begin end

  16. Filtering Approach • Problem:has a black-and-white view of the world • Does not take into account • how many relevant temporal expressions a document contains • how closely they match the temporal expressions specified in the user‘s query

  17. Filtering Approach • Problem:has a black-and-white view of the world • Does not take into account • how many relevant temporal expressions a document contains • how closely they match the temporal expressions specified in the user‘s query • query: 1980 – 1990 1980 – 1989 is more relevant than 23 March 1984

  18. Weighted Approach (LMW) • Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query

  19. Weighted Approach • Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query • We assume that qtext and qtemp are produced independently

  20. Weighted Approach • Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query • We assume that qtext and qtemp are produced independently • Temporal expressions occur independently

  21. Weighted Approach • Each temporal expression T in d is a sample from a different generative model

  22. Weighted Approach • Each temporal expression T in d is a sample from a different generative model • Generating a temporal expression Q = [qBegin, qEnd] given dtemp • draw a single temporal expression T=[dBegin, dEnd] at uniform from d • generate Q by the generative model that is associated with T

  23. Weighted Approach • Each temporal expression T in d is a sample from a different generative model • Generating a temporal expression Q = [qBegin, qEnd] given dtemp • draw a single temporal expression T=[dBegin, dEnd] at uniform from d • generate Q by the generative model that is associated with T • The likelihood of generating Q by the set of generative models that produced dtemp

  24. Weighted Approach • Generate Q=[qBegin, qEnd]from the query by the generative model that is associated with T = [dBegin, dEnd] from a document P(qBegin) P(qEnd|qBegin) dBegin-α(dEnd-dBegin) qBegin dBegin dEnd qBegin qEnd dEnd dEnd+α(dEnd-dbegin)

  25. Weighted Approach • Generate Q=[qBegin, qEnd]from the query by the generative model that is associated with T = [dBegin, dEnd] from a document P(qBegin) P(qEnd|qBegin) dBegin -α(dEnd-dBegin) qBegin dBegin dEnd qBegin qEnd dEnd dEnd +α(dEnd-dbegin) produces only relevant temporal expressions of T P(Q|T) gets smaller as the length of their overlap decreases

  26. Outline Motivation Model Our Approach Experimental Evaluation

  27. Experimental Evaluation Dataset HTML snapshot of English Wikipedia from May 2007 containing ~ 2M documents Implementation • Terrier Information Retrieval Platform: • provides an implementation of Ponte & Croft's approach • LMF, LMW • Java + MySQL • A set of regular expressions for extracting temporal information

  28. Experimental Evaluation Spanish painter 18th century Anectodal query results - 1

  29. Experimental Evaluation Sea Battle 1650 - 1670 Anectodal query results - 2

  30. Experimental Evaluation User Study • 20 queries • Pooling top-10 results returned by the three methods • Relevance assessment by 15 users • highly relevant: 2 • marginally relevant: 1 • irrelevant: 0 • NDCG as a measure of effectiveness

  31. Experimental Evaluation

  32. Thank you! Questions?

  33. Conclusion • Documents are rich of temporal expressions, but existing retrieval models are ignorant of their inherent semantics • Our work proposes two methods addressing this problem • Initial experimental evidence shows that our methods improve retrieval effectiveness for temporal information needs

  34. Experimental Evaluation

  35. b’ e e+α(e-b) Weighted Approach • generative model associated with T =[b,e] P(b’) P(e’) b e b-α(e-b) only generates overlapping intervals of T P(b’,e’) ~ |overlap|

  36. Our Baseline: Ponte and Croft‘s Model (LM) • Query likelihood: the likelihood that a query q and a document d is generated by the same language model • depends on the term frequency of query words in the document and their collection frequency

More Related