RMIT at TDT 2003

RMIT at TDT 2003 Nicholas Lester and Hugh E. Williams Search Engine Group (SEG) RMIT

RMIT Topic Detection and Tracking Overview • RMIT participated in the tracking and detection tasks this year • Presentation Overview: • Tracking system overview • Tracking normalisation work • Tracking results • Detection system overview • Detection results • Conclusions and Future Goals

Topic Tracking in 2002 • Language model based (LLR) • Stemming, stopping (~500 English words) • Unsupervised adaptation • Query expansion • First-M words • Provided transcriptions and translations used for multilingual tracking

Topic Tracking in 2003 • Continued to use transcriptions and translations • Discontinued use of query expansion • Discontinued use of first-m summarisation • Bug fixes and lots of experimentation • Added source (hence language) and topic normalisation

The need for normalisation

Normalisation approach • Normalisation has been the subject of research by other groups • Most common approach is to use Gaussian model of scores and statistics from a previous corpus • Our approach was also Gaussian, but keep source and topic statistics on-the-fly • Source normalisation: • Topic normalisation

Effect of normalisation

Effect of normalisation (continued)

Tracking results in 2003

Topic Detection • Our first attempt at the detection task • Aiming to develop a competitive system (over time) • We built a k nearest neighbour classifier • Based on our text search engine • Two pass approach: index first, then categorise • But doesn’t use a deferral period, in spirit • Has the same input filtering capabilities as our tracking system (stemming, stopping, first-m) • Produced non-hierarchical clusters

Detection Results • CDET of 0.6230 using conditions SR=nwt+bnman TR=mul,eng boundary DEF=X (almost primary conditions) • Lots of things to improve upon • Hierarchical clustering • Use of a query weight normalised metric (probably cosine) • Removal of implementation limits • Experiment more with existing features • Use of incremental indexing

Conclusions • Improved our tracking system • Implemented a normalisation scheme • System performs well • About a year behind best team • A first attempt at detection • We hope to be more competitive next year

The End Questions?

RMIT at TDT 2003

RMIT at TDT 2003

Presentation Transcript

tdt 2002 straw man

— RMIT Council Elections

RMIT

RMIT Industry Forum

TDT 69

PhD Experience at RMIT

UMass Amherst at TDT 2003

Overview of the TDT-2003 Evaluation and Results

Teaching using the DLS at RMIT

RMIT University at INEX 2004 Heterogeneous Track Experiments

PROCLAIMING JESUS AT RMIT UNIVERSITY

Software Engineering Research at RMIT

CMU at TDT 2004 — Novelty Detection

Asia @ RMIT

RMIT LEGAL SERVICE

UMass at TDT 2000

RMIT University

Video Shot Boundary Detection at RMIT University

Discrete TDT calculation

TDT 4242