260 likes | 384 Views
Shachar Mirkin Joint work with: Ido Dagan, Eyal Shnarch EACL-09. Evaluating the Inferential Utility of Lexical-Semantic Resources. You are here. Quick Orientation. Quick Orientation – Lexical Inference. Who won the football match between Israel and Greece on Wednesday?.
E N D
Shachar Mirkin Joint work with: Ido Dagan, Eyal Shnarch EACL-09 Evaluating the Inferential Utility of Lexical-Semantic Resources
You are here Quick Orientation
Quick Orientation – Lexical Inference Who won the football match between Israel and Greece on Wednesday?
Quick Orientation – Lexical Inference Who won the footballmatch between Israel and Greece on Wednesday? ATHENS, April 1 (Reuters) – Greece beats Israel 2-1 in their World Cup Group Two qualifier.
Motivation • Common knowledge: Lexical relations are useful for semantic inference • Common practice: Exploit lexical-semantic resources • WordNet - synonymy, hyponymy • Distributional-similarity • Yet, no clear picture: • Which semantic relations are needed? • How and when they should be utilized? • What’s available in current resources and what’s missing? • Our goal - clarify the picture • thru comparative evaluation
A Textual Entailment Perspective • Generic framework for semantic inference • Recognizing that one text (h) is entailed by another (t) • Addresses variability in text • Applied semantic inference reducible to entailment • Useful for generic evaluation of lexical inference
Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.
verb entailment located in • Terminology: • Lexical Entailment • Entailment Rules : LHS RHS • strike disrupt • Rule Application hypernymy Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportationstrike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009. • Should be found in knowledge resources • but often not available
Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.
Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro disruptions • Same Inference when h is a lexical phrase (e.g. IR)
Resources for Lexical Semantic Relationships • Plenty of resources are out there • None dedicated for lexical entailment inference • We evaluated 7 popular resources, of varying nature: • Construction method • Relation types • Extracted relations which: • Are commonly used in applications • Correspond to lexical entailment
Evaluated Resources Statistical extension of WordNet Corpus-based Snow Based on human knowledge CBC Lin-Dep Lin-Prox Wiki WordNet XWN
Evaluation Rational • Evaluation Goal • Assess the practical utility of resources • Resource’s utility • Depends on the validity of its rule applications • Vs. % of correct rules • Many correct & incorrect rules may hardly be applied • Simulate rule applications and judge their validity • Instance-based evaluation (rather than rule-based)
Evaluation Scheme Input: • Entailment rules from each resource • A sample of test hypotheses • 25 noun-noun queries from TREC 1-8 • railway accidents; outpatient surgery; police deaths • Texts from which the hypotheses may be inferred • TREC corpora Evaluation flow: • Apply rules to find possibly entailing texts • Judge rule applications • Utilize human annotation to avoid dependence on a specific system
Rules Resource r1 = lakewater r2 = soilwater + - Evaluation Methodology Generate intermediate hypotheses h’1= lake pollution h’2 = soil pollution Test Hypotheses h = water pollution for each word in h … corpus Retrieve matching texts t1, t2, t3, … … does t entail h’ ? does t entail h? sample texts valid rule application yes yes Chemicals dumped into the lake are the main cause for its pollution High levels of air pollution were measured around the lake no no invalid rule application t is discarded Soilpollution happens when contaminants adhere to the soil
Results - Metrics • Precision: • Percentage of valid rule applications for the resource • Total number of texts entailing the hypothesis is unknown • Absolute recall cannot be measured • Recall-share: • % of entailing sentences retrieved by the resource rules, relative to all entailing texts retrieved by both the original hypothesis and the rules • Macro-average figures
Results • Precision: • Precision generally quite low • Relatively high precision for resources based on human knowledge • Vs. corpus-based methods • Snow – still high precision • Recall: • Some resources’ obtain very little recall • WordNet’s recall limited • Many more relations are found within (inaccurate) distributional-similarity resources
Results Analysis: Current Scope and Gaps
Missing Relations • Coverage of most resources is limited • Lin’s coverage substantially larger than WordNet’s • But not usable due to low precision • Missing instances of existing WordNet relations • Proper names • Open class words • Missing non-standard relation types next slide
Non-Standard Entailment Relations • Such relations had significant impact on recall • Don’t comply with any WordNet relation • Mostly in Lin’s resources (1/3 of their recall) • Sub-types examples: • Topical entailment - IBM (company) computers • Consequential - childbirth motherhood • Entailments of arguments by predicate – breastfeeding baby • Often non-substitutable
Required Auxiliary Info (1) • Additional information needed for proper rule application: • Should be attached to rules in resources; • and considered by Inference systems • Rules’ priors • Likelihood of a rule to be correctly applied in arbitrary context • Some information is available (WordNet’s sense order, Lin’s ranks) • Empirically tested - not sufficient on its own (too much recall lost) • Using top-50 rules, Lin-prox loses 50% of relative recall • Using first-sense: WordNet loses 60%
Required Auxiliary Info (2) Lexical context • Known issue: rules should be applied only in appropriate contexts • Main reason for relatively low precision of WordNet • Addressed by WSD or context-matching models Logical context • Some frequently-ignored relations in WordNet are significant: • efficacy ineffectiveness (antonymy) • arms guns (hypernymy) • government official (holonymy) • 1/7 of Lin-Dep recall • Require certain logical conditions to occur • Include info about suitable lexical & logical contexts of rules • Combine prior with context models scores (Szpektor et al. 2008) • Needed: typology of relations by inference types
Conclusions • Current resources far from being sufficient • Lexical relations should be evaluated relative to applied inference • Rather than on correlations with human associations or WordNet • Need dedicated resources for lexical inference rules • Acquire additional missing rule instances • Specify and add missing relation types • Add auxiliary information needed for rule application
Conclusions – Community Perspective • Observation: missing feedback about resource utility for inference in applications • Resources and applications typically developed separately • Need tighter feedback between them • Community effort required: • Publicly available resources for lexical inference • Publicly available inference applications • Application-based evaluation datasets • Standardize formats/protocols for their integration
Shachar Mirkin mirkins@cs.biu.ac.il Thank you!