200 likes | 216 Views
Explore the effectiveness of temporal text mining methods through a comprehensive evaluation framework. This study delves into different methods like bursty words, group representation, and combo representation, showcasing strengths and weaknesses. The approach includes cross-method evaluation and query generation to improve results, with a focus on precision-oriented measures and named entities. Results suggest that combo representation methods are more robust, especially in precision-oriented tasks. The study also highlights the importance of standardized, diverse datasets for future work and suggests exploring sources of bias in method evaluation.
E N D
From bursty patterns to bursty facts:The effectiveness of temporal text mining for news Ilija Subašić & Bettina Berendt K.U. Leuven, Belgium
Agenda • The first problem: temporal text mining • Solution methods • The second problem: evaluation of 2) • Our approach: cross-evaluation framework • Case study: evaluation of 3 methods
Temporal text mining (TTM): The problem time • What happened? • What were the important new developments in a time period?
TTM: keyword representation methods [Kleinberg, 2002] “bursty“ ~ more frequent in 1985-94 than in the whole analysed time
TTM:group representation methods [Mei & Zhai, 2005]
TTM: combo representation methods– STORIES: graphical summary [Subašić, Berendt, 2008+]
Demo at ECML/PKDD STORIES: story tracking and exploration
Evaluations so far • Standardized tasks and competitions • DUC update task & ROUGE framework: summarization • TREC Novelty track (2002-04): novel-sentence retrieval • Disadvantages: • # documents too small: 10 for DUC and 25 per topic for TREC • Output is textual not possible to compare all TTM methods • TTM: evaluations limited to the respective method and corpora
Retrieved sentences Patterns Queries Cross-evaluation: Our approach (1) Sentence retrieval Query-likelihood retrieval QL More : https://sites.google.com/site/subasicilija/ttm-evaluation Query generation Generic / method-specific (top bursty elements & combinations) shave hair britney_spears Last Friday, pop star Britney shaved her head, parting with her long hair.
Our approach (2):Sentences‘ “precision/recall“ Sentence retrieval Query-likelihood retrieval QL Query generation Generic / method-specific (top bursty elements & combinations) Retrieved sentences Patterns Queries IR-style evaluation ROUGE2, ROUGE.SU4, aggregate measures, Friedman and Tukey‘s multiple comparison test Ground-truth sentences
Our approach (3):“Recall-oriented“ aggregate measure maxMR Retrieved sentences t Ground- truth sentences t Best fit (ROUGE) Normalize by max. possible best fit All sentences t
Our approach (4):“Precision-oriented“ aggregate measure maxMP Retrieved sentences method I Ground- truth sentences Method II has a better chance of good matches Scale maxMR by Retrieved sentences method II
Case study experiment: Data & settings • Corpus 1: Crime case • 21 weeks, 306 documents, 31 ground-truth sentences • Corpus 2: Celebrity reporting • 8 weeks, 3000 documents, 19 ground-truth sentences • Corpora available at https://sites.google.com/site/subasicilija/ttm-evaluation • M1: a keyword representation method: Kleinberg‘s bursty words • M2: a group repr.method: Mei & Zhai‘s temporal text mining • M3: a combo representation method: STORIES
Results: Top group method comparison maxMR maxMP
Results: Top group method comparison maxMR maxMP
maxMP maxMR Results: Query generation comparison
maxMP maxMR Results: Query generation comparison
Summary • First cross-methods evaluation framework for Temporal Text Mining methods with different patterns • Experimental investigation of 3 TTM types • Results: • different methods – different strengths and weaknesses • M3/named entities: most robust method over settings • M3 variants > M1, M2 in “precision-oriented” measures • specific query generation improves “precision-oriented” results, especially for M1 and M2 • corpus dependence
Future work • Standardized, bigger, more varied datasets • Establish a baseline (ROUGE originally for longer text sequences) • Explore possible sources of bias for/against specific methods • User studies (in progress)