Clustering and Exploring Search Results using Timeline Constructions

Clustering and Exploring Search Results usingTimeline Constructions Presenter: Tsai TzungRuei Authors: Omar Alonso, Michael Gertz, Ricardo Baeza-Yates 國立雲林科技大學 National Yunlin University of Science and Technology CIKM 2009

Outline • Motivation • Objective • Time annotated document model • Methodology • Experiments • Conclusion • Comments

Motivation • Any of the current search engines does not exploit the temporal information embedded in the documents. • Do you think current timelines for organizing or clustering search results (such as in Google’s timeline) are useful for some of your daily search activities? • Do you use (or would use) timelines to explore search results? • Please indicate some search scenarios where you use timelines or would like to use timelines to organize search results. • Please give some examples of search scenarios where current search engines do not sufficiently support the concept of timelines to organize and explore search results? • What other features would you like to see in the context of timelines? 時間軸

Objective • To present an add-on to traditional information retrievalapplications in which we exploit various temporal informationassociated with documents to present and cluster documentsalong timelines.

TIME ANNOTATED DOCUMENT MODEL • Time and Timelines • Temporal Expressions • Temporal Document Profiles Our base timeline, denoted Td, is an interval of consecutive day chronons.EX: “March 12, 2002; March 13, 2002;March 14, 2002” implicit temporal expression EX:“Valentine's Day 2006” Explicit temporal expressions EX:December 2004 Relative temporal expressions EX:“today” Explicit implicit timestamps Relative

Methodology • PROTOTYPE • Process Overview Alembic (POS tagger) GUTime temporal tagger • XML • Document • (tdp) Corpora Oracle

Methodology • TCluster • Constructing a Time Outline for the documents in the hit list Lq. • Document Clustering • Ranking Documents in a Cluster a hit list Lq =[d1, d2, . . . , dk] of k documents

Experiments • DMOZ • Introduction :a multilingual open content directory 2010, 2006, 2002, 1998 and 1994 document clusters Result documents are well classified by users in terms of the actual event. World Cup documents pre-defined categories(5)< TCluster (21) Each World Cup document has a single event as the main theme.

Experiments • The TimeBank 1.2 corpus • It contains news articles that have been annotated using TimeML with temporal expressions related to events, times and temporal links between events and times. Result A 50% increase in the number of clusters discovered by TCluster

Experiments • Relevance Evaluation using AMT • It is a crowdsourcing platform Result The average response was 4.04 (with an 80% agreement level)

Conclusion • MAJOR CINTRIBUTION • TCluster algorithm provides great flexibility and allows users to explore clusters of search result documents that are organized along well-defined timelines, supporting different levels of time granularity. • The utility of the time-based clustering over existing approaches that cluster documents only based on document timestamps. • FUTURE WORK • To want to study the weighting of relative temporal expressions as well as different sentence distance functions for determining the rank of documents in a cluster.

Comment • Advantage • Provides a new method of time searching • Drawback • Some mistakes • Application • information retrieval • Clustering

Clustering and Exploring Search Results using Timeline Constructions

Clustering and Exploring Search Results using Timeline Constructions

Presentation Transcript

Search Results

Clustering Web Search Results

Search Results

Clustering Web Search Results

Searches and Search Results

Exploring Data using Dimension Reduction and Clustering

Clustering and Exploring Search Results using Timeline Constructions

Topical Clustering of Search Results

Search Pages and Results

Semantically Enhanced Desktop Search Using Directory-Based Clustering and Wordnet Knowledge

Semantic, Hierarchical, Online Clustering of Web Search Results

Improving Web Search Results Using Affinity Graph

Search strategy and results

Clustering Search Results Using PLSA

Evaluating Hiera r chical Clustering of Search Results

Online Clustering of Web Search results

Search Results

Clustering of search engine results by Google

QIS5: Process, timeline and main results

Clustering Personalized Web Search Results

Optimized Graph Search Using Multi-Level Graph Clustering

Search Pages and Results