70 likes | 244 Views
The TREC-9 Adaptive Filtering track. (Coordinators: David Hull and Stephen Robertson) Stephen Robertson Microsoft Research Cambridge ser@microsoft.com. Routing and filtering at TREC. Routing task (from TREC-1): Text topics Training set (documents with relevance judgements)
E N D
The TREC-9 Adaptive Filtering track (Coordinators: David Hull and Stephen Robertson) Stephen Robertson Microsoft Research Cambridge ser@microsoft.com
Routing and filtering at TREC Routing task (from TREC-1): Text topics Training set (documents with relevance judgements) Test set (new documents without relevance judgements) Time is implicit (training set is history, test set is future)
Routing and filtering at TREC Adaptive filtering (from TREC-6): Text topics Time-stamped documents: switch on document stream at time 0 (little or no history) Any incoming document may be sent to user… … and any relevance judgement on that document may be used to modify the query… … but new query only applies to later documents
Routing and filtering at TREC TREC-9 track: Approx 350,000 Medline documents, covering approx 4¾ years (~6000 per month) Topic set OHSU: 63 Ohsumed queries Text topics with relevance judgements Topic set MSH: 4903 MeSH headings Scope notes with NLM assignments Topic set MSH-SMP: sample of 500 of these
Structure of the TREC-9 tasks Training set: first 9 months’ data (ohsumed.87) Test set: remaining 4 years’ data (ohsumed.88-91) 4 tasks Adaptive filtering Batch-adaptive filtering Batch filtering (no adaptation) Routing
Structure of the TREC-9 tasks Adaptive filtering initialization: Text topic + 2 relevant documents from training set (actually 2 for OHSU, 4 for MSH) idfs from training set can be used Batch filtering/routing initialization: Text topic + any use of training set (including relevance judgements) Adaptation (adaptive and batch-adaptive): Use judgements on documents previously returned
Measures Utility: Score (credit) x for a reldoc, debit y for a non-rel doc retrieved TREC-9 utility: credit 2, debit 1 T9U: Utility but with minimum (-100 or -400) Precision-oriented measure: Target number of docs retrieved over whole period (=50 for TREC-9)