120 likes | 223 Views
Clustering and Exploring Search Results using Timeline Constructions. Presenter: Tsai Tzung Ruei Authors: Omar Alonso, Michael Gertz , Ricardo Baeza -Yates. 國立雲林科技大學 National Yunlin University of Science and Technology. CIKM 2009. Outline. Motivation Objective
E N D
Clustering and Exploring Search Results usingTimeline Constructions Presenter: Tsai TzungRuei Authors: Omar Alonso, Michael Gertz, Ricardo Baeza-Yates 國立雲林科技大學 National Yunlin University of Science and Technology CIKM 2009
Outline • Motivation • Objective • Time annotated document model • Methodology • Experiments • Conclusion • Comments
Motivation • Any of the current search engines does not exploit the temporal information embedded in the documents. • Do you think current timelines for organizing or clustering search results (such as in Google’s timeline) are useful for some of your daily search activities? • Do you use (or would use) timelines to explore search results? • Please indicate some search scenarios where you use timelines or would like to use timelines to organize search results. • Please give some examples of search scenarios where current search engines do not sufficiently support the concept of timelines to organize and explore search results? • What other features would you like to see in the context of timelines? 時間軸
Objective • To present an add-on to traditional information retrievalapplications in which we exploit various temporal informationassociated with documents to present and cluster documentsalong timelines.
TIME ANNOTATED DOCUMENT MODEL • Time and Timelines • Temporal Expressions • Temporal Document Profiles Our base timeline, denoted Td, is an interval of consecutive day chronons.EX: “March 12, 2002; March 13, 2002;March 14, 2002” implicit temporal expression EX:“Valentine's Day 2006” Explicit temporal expressions EX:December 2004 Relative temporal expressions EX:“today” Explicit implicit timestamps Relative
Methodology • PROTOTYPE • Process Overview Alembic (POS tagger) GUTime temporal tagger • XML • Document • (tdp) Corpora Oracle
Methodology • TCluster • Constructing a Time Outline for the documents in the hit list Lq. • Document Clustering • Ranking Documents in a Cluster a hit list Lq =[d1, d2, . . . , dk] of k documents
Experiments • DMOZ • Introduction :a multilingual open content directory 2010, 2006, 2002, 1998 and 1994 document clusters Result documents are well classified by users in terms of the actual event. World Cup documents pre-defined categories(5)< TCluster (21) Each World Cup document has a single event as the main theme.
Experiments • The TimeBank 1.2 corpus • It contains news articles that have been annotated using TimeML with temporal expressions related to events, times and temporal links between events and times. Result A 50% increase in the number of clusters discovered by TCluster
Experiments • Relevance Evaluation using AMT • It is a crowdsourcing platform Result The average response was 4.04 (with an 80% agreement level)
Conclusion • MAJOR CINTRIBUTION • TCluster algorithm provides great flexibility and allows users to explore clusters of search result documents that are organized along well-defined timelines, supporting different levels of time granularity. • The utility of the time-based clustering over existing approaches that cluster documents only based on document timestamps. • FUTURE WORK • To want to study the weighting of relative temporal expressions as well as different sentence distance functions for determining the rank of documents in a cluster.
Comment • Advantage • Provides a new method of time searching • Drawback • Some mistakes • Application • information retrieval • Clustering