Adaptive Topic Tracking at Maryland

Adaptive Topic Tracking at Maryland Tamer Elsayed, Douglas W. Oard, David Doermann University of Maryland, College Park Gary Kuhn National Security Agency TDT-2004

Outline • Results • System design • Interpreting the results • Next steps

Cost=0.6507 Non-Adaptive Topic Tracking Bottom left is better No score normalization

Cost=0.2438 Adaptive Topic Tracking No score normalization, unjudged treated as firmly off-topic

Cost=0.3789 Adaptive Topic Tracking One-pass score normalization, unjudged treated as firmly off-topic

Non-Adaptive System Design TDT-5 Training Epoch Evaluation Epoch Compute log-odds ngram weights Compute story scores

Log-Odds Term Weights

Computing Story Scores

Non-Adaptive System Design TDT-5 Training Epoch Evaluation Epoch Compute log-odds ngram weights Compute story scores

Compute Normalization factor Normalize Story scores Adaptive System Design TDT-4 TDT-5 Extended Training Epoch Training Epoch Evaluation Epoch Compute log-odds ngram weights Compute story scores

Lack of normalization probably hurt! What can we say about the effect of incomplete judgments? Interpreting Non-Adaptive Results

Normalization hurt! One-pass design is the problem DET has limitations Changing the threshold changes our topic model! Threshold selection is now a critical path item How does judgment density affect the results? Interpreting Adaptive Results Not normalized Normalized

Next Steps • Further explore normalization • Implement continuous renormalization • Tune parameters on devtest data • Decide between TDT-5 and TDT-4 • Is incomplete judging harmful? • Define richer training sets • Explicit queries • Many known on-topic/off-topic training stories • Models of (imperfect) behavioral feedback

Our Favorite Quote of the Day • “It takes time to get the implementation correct” [Yiming] • We had 30 days from project initiation to non-adaptive submission

Adaptive Topic Tracking at Maryland