Evaluating the Inferential Utility of Lexical-Semantic Resources

Shachar Mirkin Joint work with: Ido Dagan, Eyal Shnarch EACL-09 Evaluating the Inferential Utility of Lexical-Semantic Resources

You are here Quick Orientation

Quick Orientation – Lexical Inference Who won the football match between Israel and Greece on Wednesday?

Quick Orientation – Lexical Inference Who won the footballmatch between Israel and Greece on Wednesday? ATHENS, April 1 (Reuters) – Greece beats Israel 2-1 in their World Cup Group Two qualifier.

Motivation • Common knowledge: Lexical relations are useful for semantic inference • Common practice: Exploit lexical-semantic resources • WordNet - synonymy, hyponymy • Distributional-similarity • Yet, no clear picture: • Which semantic relations are needed? • How and when they should be utilized? • What’s available in current resources and what’s missing? • Our goal - clarify the picture • thru comparative evaluation

A Textual Entailment Perspective • Generic framework for semantic inference • Recognizing that one text (h) is entailed by another (t) • Addresses variability in text • Applied semantic inference reducible to entailment • Useful for generic evaluation of lexical inference

Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.

verb entailment located in • Terminology: • Lexical Entailment • Entailment Rules : LHS RHS • strike disrupt • Rule Application hypernymy Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportationstrike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009. • Should be found in knowledge resources • but often not available

Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro services disrupted in April 2009.

Lexical-semantic relationships t: Dear EACL 2009 Participant, We are sorry to inform you that an Air Traffic and Public Transportation strike has been announced for Thursday 2 April, 2009. h: Athens’ Metro disruptions • Same Inference when h is a lexical phrase (e.g. IR)

Evaluating Lexical-semantic Resources

Resources for Lexical Semantic Relationships • Plenty of resources are out there • None dedicated for lexical entailment inference • We evaluated 7 popular resources, of varying nature: • Construction method • Relation types • Extracted relations which: • Are commonly used in applications • Correspond to lexical entailment

Evaluated Resources Statistical extension of WordNet Corpus-based Snow Based on human knowledge CBC Lin-Dep Lin-Prox Wiki WordNet XWN

Evaluation Rational • Evaluation Goal • Assess the practical utility of resources • Resource’s utility • Depends on the validity of its rule applications • Vs. % of correct rules • Many correct & incorrect rules may hardly be applied • Simulate rule applications and judge their validity • Instance-based evaluation (rather than rule-based)

Evaluation Scheme Input: • Entailment rules from each resource • A sample of test hypotheses • 25 noun-noun queries from TREC 1-8 • railway accidents; outpatient surgery; police deaths • Texts from which the hypotheses may be inferred • TREC corpora Evaluation flow: • Apply rules to find possibly entailing texts • Judge rule applications • Utilize human annotation to avoid dependence on a specific system

Rules Resource r1 = lakewater r2 = soilwater + - Evaluation Methodology Generate intermediate hypotheses h’1= lake pollution h’2 = soil pollution Test Hypotheses h = water pollution for each word in h … corpus Retrieve matching texts t1, t2, t3, … … does t entail h’ ? does t entail h? sample texts valid rule application yes yes Chemicals dumped into the lake are the main cause for its pollution High levels of air pollution were measured around the lake no no invalid rule application t is discarded Soilpollution happens when contaminants adhere to the soil

Results - Metrics • Precision: • Percentage of valid rule applications for the resource • Total number of texts entailing the hypothesis is unknown • Absolute recall cannot be measured • Recall-share: • % of entailing sentences retrieved by the resource rules, relative to all entailing texts retrieved by both the original hypothesis and the rules • Macro-average figures

Results • Precision: • Precision generally quite low • Relatively high precision for resources based on human knowledge • Vs. corpus-based methods • Snow – still high precision • Recall: • Some resources’ obtain very little recall • WordNet’s recall limited • Many more relations are found within (inaccurate) distributional-similarity resources

Results Analysis: Current Scope and Gaps

Missing Relations • Coverage of most resources is limited • Lin’s coverage substantially larger than WordNet’s • But not usable due to low precision • Missing instances of existing WordNet relations • Proper names • Open class words • Missing non-standard relation types  next slide

Non-Standard Entailment Relations • Such relations had significant impact on recall • Don’t comply with any WordNet relation • Mostly in Lin’s resources (1/3 of their recall) • Sub-types examples: • Topical entailment - IBM (company) computers • Consequential - childbirth  motherhood • Entailments of arguments by predicate – breastfeeding  baby • Often non-substitutable

Required Auxiliary Info (1) • Additional information needed for proper rule application: • Should be attached to rules in resources; • and considered by Inference systems • Rules’ priors • Likelihood of a rule to be correctly applied in arbitrary context • Some information is available (WordNet’s sense order, Lin’s ranks) • Empirically tested - not sufficient on its own (too much recall lost) • Using top-50 rules, Lin-prox loses 50% of relative recall • Using first-sense: WordNet loses 60%

Required Auxiliary Info (2) Lexical context • Known issue: rules should be applied only in appropriate contexts • Main reason for relatively low precision of WordNet • Addressed by WSD or context-matching models Logical context • Some frequently-ignored relations in WordNet are significant: • efficacy  ineffectiveness (antonymy) • arms  guns (hypernymy) • government  official (holonymy) • 1/7 of Lin-Dep recall • Require certain logical conditions to occur • Include info about suitable lexical & logical contexts of rules • Combine prior with context models scores (Szpektor et al. 2008) • Needed: typology of relations by inference types

Conclusions • Current resources far from being sufficient • Lexical relations should be evaluated relative to applied inference • Rather than on correlations with human associations or WordNet • Need dedicated resources for lexical inference rules • Acquire additional missing rule instances • Specify and add missing relation types • Add auxiliary information needed for rule application

Conclusions – Community Perspective • Observation: missing feedback about resource utility for inference in applications • Resources and applications typically developed separately • Need tighter feedback between them • Community effort required: • Publicly available resources for lexical inference • Publicly available inference applications • Application-based evaluation datasets • Standardize formats/protocols for their integration

Shachar Mirkin mirkins@cs.biu.ac.il Thank you!

Evaluating the Inferential Utility of Lexical-Semantic Resources

Evaluating the Inferential Utility of Lexical-Semantic Resources

Presentation Transcript

Lexical and semantic change

STRUCTURE AND FREQUENCY OF LEXICAL SEMANTIC CLASSES

Evaluating Educational Resources

Evaluating Web Resources

Lexical Semantics and Semantic Annotation

Evaluating Web Resources

Semantic Satiation, Lexical Ambiguity , and Semantic Distance

Semantic Inference at the Lexical-Syntactic Level

EVALUATING ONLINE RESOURCES

Evaluating Online Resources

Evaluating the Utility of the Observer Role

Evaluating Web Resources

syntax driven lexical semantic linking

Evaluating Web Resources

What is the Semantic Utility Architecture

Evaluating Web Resources

Evaluating Web Resources

Evaluating Online Resources

Lexical and semantic selection

Evaluating the Utility of the Observer Role