1 / 14

Overview of the KBP 2013 Slot Filler Validation Track

Overview of the KBP 2013 Slot Filler Validation Track. Hoa Trang Dang National Institute of Standards and Technology. Slot Filler Validation (SFV). Track Goals Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval

bdickson
Download Presentation

Overview of the KBP 2013 Slot Filler Validation Track

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of the KBP 2013Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology

  2. Slot Filler Validation (SFV) • Track Goals • Allow teams without a full slot-filling system to participate, focus on answer validation rather than document retrieval • Evaluate the contribution of RTE systems on KBP slot-filling • Allow teams to experiment with system voting and global • SFV input: • Candidate slot filler • Possibly additional information about candidate slot fillers • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Can only improve precision, not recall of full slot-filling systems • Evaluation metrics depends on SFV use case and availability of additional information about candidate fillers • TAC RTE KBP Validation task (2011) • TAC KBP Slot Filler Validation task (2012)

  3. TAC RTE KBP Validation task (2011) Each slot filler returned by SF systems • 1 RTE evaluationpair, where: • T is the entiredocumentsupporting the slot filler • H is a set ofsynonymoussentences, representingdifferentrealizations of the slot filler

  4. Use Case 1: SFV as Textual Entailment (2011) • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance) • Local Approach: • Generic textual entailment: H is relation implied by candidate slot filler (e.g., “Barack Obama has lived in Chicago”), T is provenance (entire document, or smaller regions defined by justification offsets) • Tailored textual entailment: train on different slot types; could be a validation module for a full slot filling system. • Evaluation: • F score on entire pool of candidate slot fillers (unique slot filler, provenance) • Baseline: All T’s classified as entailing the corresponding H: P=R=percentage of entailing pairs in the pooled SF responses • Weak baseline, easily beat by all SFV systems; not a direct measure of utility of SFV to SF

  5. Use Case 2: SFV impact on single SF systems • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • Global Approach: • System Voting, leveraging features across multiple SF runs • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

  6. Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

  7. Slot Filler Validation (SFV) 2012 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • One SFV submission, decreased F1 of almost all SF runs except poorest performing SF runs.

  8. Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run

  9. Slot Filler Validation (SFV) 2013 • SFV input: • All regular English slot filling input (slot definitions, queries, source documents) • Individual candidate slot fillers (filler, provenance, confidence) • Broken out into individual slot filling runs • System profile for each SF run • Preliminary assessment of 10% of KBP 2013 Slot Filling queries • SFV output: • Binary classification (Correct / Incorrect) of each candidate slot filler • Evaluation: • Filter out “Incorrect” slot fillers from each run, and score according to regular English SF; compare to score for original run • Score only on the 90% of KBP 2013 slot filling queries that didn’t have preliminary assessments released as part of SFV input

  10. SF System Profile • SF Team ranks in KBP 2009-2012 • Did the system extract fillers from the KBP 2013 source corpus? • Do the Confidence Values have meaning? • Is the Confidence Value a probability? • Tools or methods for: • Query expansion • Document retrieval • Sentence retrieval • NER nominal tagging • Coreference resolution • Third-party relation/event extraction • Dependency/Constituent parsing • POS tagging • Chunking • Main slot filling algorithm • Learning algorithm • Ensemble model • External resources

  11. Slot Filler Validation Teams and Approaches • BIT: Beijing Institute of Technology [local] • Generic RTE approach based on word overlap, cosine similarity, and token edit distance • Stanford: Stanford University [local] • Based on Stanford’s full slot-filling system, especially component for checking consistency and validity of candidate fillers • UI_CCG: University of Illinois at Urbana-Champaign [local] • Tailored RTE approach; check candidate for slot-specific constraints • jhuapl: Johns Hopkins University Applied Physics Laboratory [weak global] • Consider only the confidence value associated with each candidate filler and aggregate confidence values across systems. • RPI_BLENDER: Rensselaer Polytechnic Institute [strong global] • Based on RPI_BLENDER full slot-filling system (like Stanford), but also leveraged full set of SFV input (including SF system profile and preliminary assessments) to rank systems and apply tier-specific filtering.

  12. Impact of RPI_BLENDER2 SFV on SF Runs Top 10 SF runs Negatively impacted SF runs

  13. Conclusion • Leveraging global features boosts scores of individual SF runs…. If done discriminately • Don’t treat all slot filling systems the same • Even weak global features (e.g. raw confidence values) may help in some cases • Caveat: other evaluation metrics also valid depending on use case. • RTE KBP validation (2011) metric may be appropriate if goal is to make assessment more efficient

More Related