80 likes | 195 Views
CMUDIR group: TDT Supervised Tracking. Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute. Overview. Outsider’s experience First participation in TDT Supervised tracking task
E N D
CMUDIR group: TDT Supervised Tracking Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute
Overview • Outsider’s experience • First participation in TDT • Supervised tracking task • Combine Rocchio and logistic regression models for bias variance trade off • Performance analysis • Work in progress • Conclusion
First Time in TDT • About our group: CMUDIR • Several years of research in TREC adaptive filtering task • First time in TDT • Supervised tracking is different from TREC adaptive filtering • Burtiness, topic definition… • Supervised tracking maybe similar to TREC adaptive filtering • Comparatively easy for us to enter • Much help from TDTers make our first participation possible
initialization document stream delivered docs Tracking System … Topic (Binary Classifier) (Utility Function) Logistic_Rocchio Learning Accumulated docs Document Labels Tracking System
Rocchio + threshold => wR wR • Step 1: Rocchio + threshold => wR • Step 2: • Step 3: Use wm as logistic regression prior mean • Step 4: Estimate posterior distribution of parameter, and use wMAP=w* Profile Learning: Using Bayesian Prior to Combine Classifiers (Zhang SIGIR 2004) Document space (N) Logistic Regression Parameter space (N+1)
Performance • Efficiency • 2-3 hours for TDT5 supervised tracking task for utility optimization • 1 CPU P4 2.4GHz, 512RAM • Effectiveness (CMU8) • Utility: 449.17 • Scaled utility: 0.7281
What does a human do? Our solution Unified Framework Use Expert Knowledge Bayesian Prior (SIGIR 04) Bayesian Graphical Models Bayesian Active Learning (ICML 03) Ask questions Use multiple forms of evidence Graphical Models (In Progress) More Work on Filtering (Supervised Tracking) Challenges: limited user supervision
Summary • Outsider’s view • Supervised tracking task is an easy entry into TDT for people already familiar with TREC adaptive filtering • TREC-style system can do well • Low effort with good result • TREC adaptive filtering task is very similar to TDT supervised tracking task • Used the TREC filtering task system for supervised tracking • Effort focused on understanding TDT data format and converting the data to what our system can handle • Very good performance for utility measure optimization • Other issues • Bias sampling problem • only have labels for documents delivered • Burst ness