130 likes | 288 Views
RMIT at TDT 2003. Nicholas Lester and Hugh E. Williams Search Engine Group (SEG) RMIT. RMIT Topic Detection and Tracking Overview. RMIT participated in the tracking and detection tasks this year Presentation Overview: Tracking system overview Tracking normalisation work Tracking results
E N D
RMIT at TDT 2003 Nicholas Lester and Hugh E. Williams Search Engine Group (SEG) RMIT
RMIT Topic Detection and Tracking Overview • RMIT participated in the tracking and detection tasks this year • Presentation Overview: • Tracking system overview • Tracking normalisation work • Tracking results • Detection system overview • Detection results • Conclusions and Future Goals
Topic Tracking in 2002 • Language model based (LLR) • Stemming, stopping (~500 English words) • Unsupervised adaptation • Query expansion • First-M words • Provided transcriptions and translations used for multilingual tracking
Topic Tracking in 2003 • Continued to use transcriptions and translations • Discontinued use of query expansion • Discontinued use of first-m summarisation • Bug fixes and lots of experimentation • Added source (hence language) and topic normalisation
Normalisation approach • Normalisation has been the subject of research by other groups • Most common approach is to use Gaussian model of scores and statistics from a previous corpus • Our approach was also Gaussian, but keep source and topic statistics on-the-fly • Source normalisation: • Topic normalisation
Topic Detection • Our first attempt at the detection task • Aiming to develop a competitive system (over time) • We built a k nearest neighbour classifier • Based on our text search engine • Two pass approach: index first, then categorise • But doesn’t use a deferral period, in spirit • Has the same input filtering capabilities as our tracking system (stemming, stopping, first-m) • Produced non-hierarchical clusters
Detection Results • CDET of 0.6230 using conditions SR=nwt+bnman TR=mul,eng boundary DEF=X (almost primary conditions) • Lots of things to improve upon • Hierarchical clustering • Use of a query weight normalised metric (probably cosine) • Removal of implementation limits • Experiment more with existing features • Use of incremental indexing
Conclusions • Improved our tracking system • Implemented a normalisation scheme • System performs well • About a year behind best team • A first attempt at detection • We hope to be more competitive next year
The End Questions?