130 likes | 450 Views
CMU TDT Report 12-13 November 2001. The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU. Time Line for TDT Activities. (Re)Start: Summer 2001 Baseline FSD, Link, Det: Sept 2001 Evaluation (of baseline): Oct 2001
E N D
CMU TDT Report 12-13 November 2001 The CMU TDT Team: Jaime Carbonell, Yiming Yang, Ralf Brown, Chun Jin, Jian Zhang Language Technologies Institute, CMU
Time Line for TDT Activities • (Re)Start: Summer 2001 • Baseline FSD, Link, Det: Sept 2001 • Evaluation (of baseline): Oct 2001 • New Techniques: Nov 2001 – Onwards • Topic-conditional Novelty • Situated NE’s (all tasks) • Source-conditional interpolated training
Baseline FSD Method • (Unconditional) Dissimilarity with Past • Decision threshold on most-similar story • (Linear) temporal decay • Length-filter (for teasers) • Cosine similarity with standard weights:
FSD Observations • Cross-site comparable baselines (cost =.7) • Data/labeling issues (from error analysis) • “Events-vs-Topics” issue (e.g. Asia crisis) • A few mislabled stories wreak havoc for FSD • Eager auto-segmentation a problem (misses) • Recommendations for TDT labeling • FSD on true events, or events within topic(s) • Change auto-segmentation optimality criterion ?? • Recommendations for TDT reserachers • Keep working hard on FSD – not cracked yet
New FSD Directions • Topic-conditional models • E.g. “airplane,” “investigation,” “FAA,” “FBI,” “casualties,” topic, not event • “TWA 800,” “March 12, 1997” event • First categorize into topic, then use maximally-discriminative terms within topic • Rely on situated named entities • E.g. “Arcan as victim,” “Sharon as peacemaker”
Baseline Story-Link Detection • Use same term-weighting and cosine similarity as FSD and detection • Decision Thresholds conditioned on language and source • Lower threshold for cross-language • Lower threshold cross-ASR/newswire • Thresholds trained on development set • 15% improvement over universal threshold
CMU Detection Incremental Retrospective Clustering Group-Average in Forward Deferral Window Same cosine similarity and terms weight as FSD