100 likes | 231 Views
Linguistic Resources for the 2012 TAC KBP Entity Linking Evaluations. Joe Ellis (presenter ), Xuansong Li, Brendan Callahan, Stephanie Strassel , . Linguistic Data Consortium University of Pennsylvania, USA. Outline. English, Chinese and Spanish source data
E N D
Linguistic Resources for the 2012 TAC KBP Entity Linking Evaluations Joe Ellis (presenter), Xuansong Li, Brendan Callahan, Stephanie Strassel, Linguistic Data Consortium University of Pennsylvania, USA
Outline • English, Chinese and Spanish source data • Annotator and assessor guidelines • Labeled training and evaluation data • Annotation Tasks and Methodologies • Namestring Selection • KB Linking • NIL Coreference • Linguistic Resources for 2012 Entity Linking TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Source Corpus – 2012 TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
KB and Guidelines Knowledge Base Corpus Guidelines • Annotator GUI and pipeline revised to improve efficiency and quality over previous years • Enhanced ability to select ambiguous and varied queries • Resulted in more challenging queries • Available at: • http://www.nist.gov/tac/2012/KBP/task_guidelines/index.html TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Existing EL Training Data TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
New EL Training & Eval Data TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking Overview Stage 1: Select name strings and ref docs Stage 3: Co-reference NIL entities Stage 2: Link namestrings to KB or mark as NIL TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking –Stage 1 • Run named entity taggers over source corpora* • Provides guided search through the corpus • Namestring Selection • Confusable, ambiguous, varied • Balance NIL, non- NIL (target even distribution) • Balance by entity type (1/3 GPEs, PERs, and ORGs) • Genre: 2/3 NW, 1/3 Web for English & Chinese; all NW for Spanish • For cross-lingual tasks, especially target non-English queries with entities mentioned in English documents *Thank you to the track coordinators for providing tagger output TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Entity Linking – Stages 2 & 3 • KB Linking • Review ref document and search KB for matching node • Multiple entities viewed together for quicker linking • Time-limited quality control pass enhanced completeness and accuracy • NIL Coreference • NIL queries (no KB match) require manual co-reference annotation • Time-limited quality control pass enhanced completeness and accuracy TAC KBP Evaluation Workshop – NIST, November 5-6, 2012
Conclusions • 2012 Achievements • Source corpus expansion • 5 new EL corpora developed (1 less than 2009-2011 combined) • New annotation pipeline/GUI supports creation of more challenging queries in less time • 2013 Goals • Further enhance annotation GUI and pipeline, address lingering inefficiencies and bugs • Further discussion of desired query qualities to fully utilize new capabilities TAC KBP Evaluation Workshop – NIST, November 5-6, 2012