1 / 13

RMIT at TDT 2003

RMIT at TDT 2003. Nicholas Lester and Hugh E. Williams Search Engine Group (SEG) RMIT. RMIT Topic Detection and Tracking Overview. RMIT participated in the tracking and detection tasks this year Presentation Overview: Tracking system overview Tracking normalisation work Tracking results

Download Presentation

RMIT at TDT 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RMIT at TDT 2003 Nicholas Lester and Hugh E. Williams Search Engine Group (SEG) RMIT

  2. RMIT Topic Detection and Tracking Overview • RMIT participated in the tracking and detection tasks this year • Presentation Overview: • Tracking system overview • Tracking normalisation work • Tracking results • Detection system overview • Detection results • Conclusions and Future Goals

  3. Topic Tracking in 2002 • Language model based (LLR) • Stemming, stopping (~500 English words) • Unsupervised adaptation • Query expansion • First-M words • Provided transcriptions and translations used for multilingual tracking

  4. Topic Tracking in 2003 • Continued to use transcriptions and translations • Discontinued use of query expansion • Discontinued use of first-m summarisation • Bug fixes and lots of experimentation • Added source (hence language) and topic normalisation

  5. The need for normalisation

  6. Normalisation approach • Normalisation has been the subject of research by other groups • Most common approach is to use Gaussian model of scores and statistics from a previous corpus • Our approach was also Gaussian, but keep source and topic statistics on-the-fly • Source normalisation: • Topic normalisation

  7. Effect of normalisation

  8. Effect of normalisation (continued)

  9. Tracking results in 2003

  10. Topic Detection • Our first attempt at the detection task • Aiming to develop a competitive system (over time) • We built a k nearest neighbour classifier • Based on our text search engine • Two pass approach: index first, then categorise • But doesn’t use a deferral period, in spirit • Has the same input filtering capabilities as our tracking system (stemming, stopping, first-m) • Produced non-hierarchical clusters

  11. Detection Results • CDET of 0.6230 using conditions SR=nwt+bnman TR=mul,eng boundary DEF=X (almost primary conditions) • Lots of things to improve upon • Hierarchical clustering • Use of a query weight normalised metric (probably cosine) • Removal of implementation limits • Experiment more with existing features • Use of incremental indexing

  12. Conclusions • Improved our tracking system • Implemented a normalisation scheme • System performs well • About a year behind best team • A first attempt at detection • We hope to be more competitive next year

  13. The End Questions?

More Related