90 likes | 187 Views
Information Retrieval and Extration. 期末專題實驗 — Relevant Sentence Detection. Overview. 實驗目標 句子層次的主題相關性偵測 分組 每組 1~4 人 Deadline and Demo 6/21. Typical Ad Hoc Retrieval. Relevant Sentence Detection. Topics and Collection. TREC 2004 Novelty Track
E N D
Information Retrieval and Extration 期末專題實驗 — Relevant Sentence Detection
Overview • 實驗目標 • 句子層次的主題相關性偵測 • 分組 • 每組1~4人 • Deadline and Demo • 6/21
Topics and Collection • TREC 2004 Novelty Track • 10 topics and their relevance judgments for system training and parameter tuning • “TrainingTopics.txt”in the dataset • 3-5 testing topics for demo
Topic Example <top> <num>Number: N54 <title> Firestone Tire Recall <toptype> Event <desc>Description: The widespread affects of the Firestone tire recall <narr>Narrative: Opinion of the public, personal, business, or company asto the general scope of the recall (too much, not enough);as to the type of tires or vehicle that should beincluded; as well as any customer complaints or onany actions taken are relevant. Documents that brieflyreport on the recall with no enlightening details are notrelevant. <documents> APW20000808.0166 NYT20000809.0226 </top>
Relevant Document Example <DOC> <DOCNO> <sdocid="APW20000808.0166"num="1"> APW20000808.0166</s> </DOCNO> <DOCTYPE> <sdocid="APW20000808.0166"num="2"> NEWS STORY</s> </DOCTYPE> <DATE_TIME> <sdocid="APW20000808.0166"num="3"> 2000-08-08 21:19</s> </DATE_TIME> <BODY> <HEADLINE> <sdocid="APW20000808.0166"num="4"> Source: Firestone To Recall Tires</s> </HEADLINE> <sdocid="APW20000808.0166"num="5"> By NEDRA PICKLER</s> <TEXT> <sdocid="APW20000808.0166"num="6"> Most of the Firestone ATX, ATX II and Wilderness AT tires are on Ford Explorers, the industry's top-selling SUV, but the recall will include tires on all brands of vehicles, the source said on condition of anonymity.</s> <sdocid="APW20000808.0166"num=“7"> The recalled tires will be replaced by other Firestone tires, the source said.</s> </TEXT> </BODY> </DOC>
Evaluation • Precision, Recall and F-measure • Usage of the evaluation program (“04.eval_novelty_run.pl” in the dataset)
List of Content in the Dataset • “TrainingTopics.txt”(file) • Training topics for system development • “RelevantDocsForTrainingTopics”(dir) • Relevant documents for each training topic • “04.qrels.relevant(TrainingTopics).txt” (file) • The set of relevant sentences for training topics • “04.eval_novelty_run.pl” (file) • Program for evaluation • “AdditionalDocuments(LATIMES)” (dir) • Additional (not necessary) document collection