440 likes | 681 Views
Approximating Textual Entailment with LFG and FrameNet Frames. Aljoscha Burchardt and Anette Frank Computational Linguistics Department Language Technology Lab Saarland University DFKI GmbH Saarbr ücken Saarbr ücken.
Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt and Anette Frank Computational Linguistics Department Language Technology Lab Saarland University DFKI GmbH Saarbrücken Saarbrücken SALSA Workshop, Saarbrücken, June 27-28, 2006 Multilingual semantic annotation: theory and applications
Overview The PASCAL Recognizing Textual Entailment task (RTE): What is it, and how to approach it? The SALSA RTE System:A baseline system for approximating Textual Entailment • Building on LFG-based syntactic analysis and frame semantics • Computing structural and semantic overlap as an approximation of textual entailment in a learning architecture • Open architecture for future extensions towards deeper modelling Linguistic analysis: LFG and FrameNet frames Approximating Textual Entailment • Computing a match graph for structural and semantic overlap • Feature extraction and machine learning Results of this year’s RTE task • Discussion, error analysis and perspectives Conclusion
TASK: Entailed? Entailed? – Yes text Sunday‘s earthquake was felt in the southern Indian city of Madras on the mainland, as well as other parts of south India. hypothesis The city of Madras is located in Southern India. The PASCAL RTE Task: What is it? A recently established Challenge for the NLP/AI community Testing a system‘s capacity to recognize „Textual Entailment“ „Realistic“, open-domain data set drawn from system outputs in NLP applications: IR, IE, QA, SUM Controlled set-up: balanced training and test sets 800/800 text-hypothesis pairs
Taking a look at the data Fine-grained linguistic analysis T: Oscar-winning actorNicolas Cage‘s new son and Superman have sth. in common ... H: Nicolas Cage‘s new son was awarded an Oscar. — No (IE) Lexical semantics and paraphrases (nominalisation, synonymy) T: [o]n December 10th 1936 King Edward VIII gave up his right to the British throne. H: King Edward VIII abdicated on the 10th of December, 1936. — Yes (QA) Inference and world knowledge T: Olson, 62, previously worked as a partner at Ernst & Young LLP, before joining the Fed board in 2001, to serve a term ending in 2010. H: Olson is a member of the Fed board. — Yes (IE) Modality T: U.S. Secretary of State Condoleezza Rice said Thursday that North Koreashouldreturn to nuclear disarmament talks and ... H: North Koreasays it will rejoin nuclear talks. — No (SUM) Temporal and local restrictions (monotonicity) T: In most Pacific countries there are very few women in parliament. H: Women are poorly represented in parliament. — Yes (!) (IR)
Textual Entailment „We say that T entails H if the meaning of H can be inferred from the meaning of T, as would typically be interpreted by people. This somewhat informal definition is based on (and assumes) common human understanding of language as well as common background knowledge.“ „Cases in which inference is very probable (but not completely certain) are still judged True.“ (Dagan, Glickmann, Magnini, RTE 2005 Workshop Proceedings) “Circumscribing Textual Entailment”? See discussions in: Zaenen, Karttunen and Crouch (2005), Manning (2006), Crouch, Karttunen and Zaenen (2006).
A Challenge, ... in fact • T: Hundreds of divers and treasure hunters, including the Duke of Argyll, have risked their lives in the dangerous waters of the Isle of Mull trying to discover the reputed 30,000,000 pounds in Gold carried by this vessel--the target of the most enduring treasure hunt in British history. H: Shipwreck salvaging was attempted. (Yes, IR) • T: The 26-member International Energy Agency said, Friday, that member countries would release oil to help relieve the U.S. fuel crisis caused by Hurricane Katrina. H: Responding to a plea from the International Energy Agency for member countries to release reserves, Canada is prepared to help. (No, SUM)
Approximating Textual Entailment How to reconcile obvious complexity and required depth? • Parsing complexity • Semantic analysis • Argument structure, anaphora, lexical meaning, semantic and discourse relations, presupposition, ... • Inferences based on linguistic meaning and world knowledge • Statistical/ML approximation of Textual Entailment • Based on state-of-the-art syntactic and shallow semantic analysis • Measuring structural and semantic overlap • With possibilities for extensions towards deeper modelling • Inference on partial structures (lexical entailment) • Targeted modelling of specific aspects, e.g. modality contexts …
H/T matching for TE match graph size hypothesis graph size text hypothesis text hypothesis text hypothesis A baseline system for approximating Textual Entailment Fine-grained LFG-based syntactic analysis • English LFG grammar (Riezler et al. 2002)broad-coverage with high-quality probabilistic disambiguation Frame Semantics • Coarse-grained lexical-semantic classification of predicates with role-based argument structure encoding • Extended semantic representations: WordNet senses, SUMO concepts Computing structural and semantic overlap • Hypothesis: high/low ratio of H/T overlap => entailment: yes/no
H/T matching for TE match graph size hypothesis graph size text hypothesis text hypothesis text hypothesis A baseline system for approximating Textual Entailment Fine-grained LFG-based syntactic analysis • English LFG grammar (Riezler et al. 2002)broad-coverage with high-quality probabilistic disambiguation Frame Semantics • Coarse-grained lexical-semantic classification of predicates with role-based argument structure encoding • Extended semantic representations: WordNet senses, SUMO concepts Computing structural and semantic overlap • A learning problem: measures of overlap, weighted entailment decision
Recognizing Textual Entailment: Graph matching& Statistical approximation hypothesis text f-structure w/ frames & concepts f-structure w/ frames & concepts text-hypothesis-match graph • matching nodes and edges • different match types (similarity types) • extensions for deeper modelling (modality, lexical entailment) Feature extraction Model training & classification The SALSA RTE System Linguistic analysis componentsand Integration XLE parsing:LFG f-structure f-structure w/ (extended) frame- semantic projection Fred/Detour + Rosy: frames & roles WordNet-based WSD:WordNet & SUMO Using XLE term rewriting system (Crouch 2005)
Determine semantic similarity based on lexical meaning, combined with similarity of argument structure, at a high level of abstraction Linguistic ComponentsLFG analysis combined with FrameNet frames Deep syntactic LFG analysis • Broad-coverage grammar with probabilistic disambiguation • Fine-grained grammatical function analysis with integrated NER • Performance on RTE-II development and test set: • Coverage: 99% ( 86% full parses, 13% partial parses) • On RTE H/T pairs: 76% fully analysed pairs – 2% single analysis only Frame semantic analysis • Focusing on lexical semantic classes and role-based argument structure • Disregarding aspects of „deep“ semantics: modality, quantification, ... • Normalisation over syntactic and lexical alternations (diatheses, lexicalisation, PoS)
Linguistic ComponentsFrame and role assignment Shalmaneser (Erk & Pado, 2006) • Shallow semantic parser for FrameNet frame and role assignment • Fred: statistical frame assignment • WSD system for predicates, in terms of frames • Rosy: semantic role assignment • Argument recognition and argument labelling • Using state-of-the-art features from robust syntactic parsing Detour (to FrameNet via WordNet) (Burchardt et al., 2005) • Aim: overcome lexical gaps in FrameNet • A rule-based frame assignment system that takes a “detour to FrameNet via WordNet” • Determine similarity of “unknown LUs” to existing frames (their LUs) based on WordNet-similarity measures
Linguistic ComponentsFrame and role assignment Fred & Rosy Fred, Detour & Rosy
Linguistic ComponentsFrame and role assignment Fred & Detour – different sense assignments (FN coverage)
Linguistic ComponentsIntegration and extended semantics projection Porting frame and role assignments to LFG f-structure • Defining a frame semantics projection using head lemmata as interface layer (accounts for parser discrepancies) • Using XLE rewrite system (Crouch 2005) Head-indexed frame & role assignments
Linguistic ComponentsIntegration and extended semantics projection Rule-based extensions of LFG-frame structures • Frames corresponding to LFG NE classes • Locations, companies, dates, … • Extra-thematic roles, based on LFG adjunct classes, etc. • Time, Reason, Location, Concessive, … +adjunct(Z,Y), ntype_sem(Y,time) ==> s::(Z,SemZ), s::(Y,SemY), time(SemZ,SemY). Extended semantics projection: WordNet and SUMO classes • WSD: Banerjee & Pedersen, 2003 • WordNet – SUMO/MILO mapping: Niles and Pease (20019
A shark attacked a human being. Linguistic ComponentsIntegration and extended semantics projection Normalisations of syntactic structure • Passive: Mapping SUBJ and OBJ to dsubj and dobj argument slots • Coindexing relative pronouns and relativised head, appositives, etc. • Heuristic rules collect antecedent candidate sets for pronominals FEF: Frame-Exchange-Format • (Partial) Visualisation of extended syntactic-semantic graph structures in FEFViewer (Alexander Koller, Coli Saarbrücken)
A walk-through-example from RTE 2006 Pair 716 Text In 1983, Aki Kaurismäki directed his first full-time feature. Hypothesis Aki Kaurismäki directed a film.
Detour System frames (via WordNet) Fred & Rosy frames & roles (statistical) Automatic Frame Annotation for Textin SALTO Viewer Collins Parse
Automatic Frame Annotation for Hypothesis 716_h: Aki Karusmäki directed a film.
Rule-based (LFG-NER) LFG and Frames for Hypothesisin FEFViewer Aki Kaurismäki directed a film.
The SALSA RTE System Recognizing Textual Entailment: Graph matching& Statistical approximation Linguistic analysis componentsand Integration hypothesis text XLE parsing:LFG f-structure f-structure w/ frames & concepts f-structure w/ frames & concepts f-structure w/ (extended) frame- semantic projection Fred/Detour + Rosy: frames & roles text-hypothesis-match graph • matching nodes and edges • different match types (similarity types) • extensions for deeper modelling (modality, lexical entailment) WordNet-based WSD:WordNet & SUMO Feature extraction Model training & classification
H/T matching for TE match graph size hypothesis graph size text hypothesis text hypothesis text hypothesis Hypothesis-Text-Match GraphsComputing structural and semantic overlap Computing structural and semantic overlap • Computing a “match graph” from text and hypothesis graphs • Matches are established by different aspects and degrees of “similarity” Approximating textual entailment • High/low overlap ratio of hypothesis and match graph => entailment: yes/no
text Hypothesis-Text-Match Graphs Different matching strategies • Match graph/Text overlap: Ratio of matched material and non-matched material in Text • Match graph/Hypothesis overlap: Ratio of the matched material and non-matched material in Hypothesis T: Leo Fender invented the first electric guitar and the electric bass guitar. H: Leo Fender invented the first electric guitar. I: 7/12 = 58% – II: 7/7 = 100% hypothesis
Hypothesis-Text-Match GraphsComputing structural and semantic overlap Graph matching using XLE rewrite system • Defining different types of match conditions on t- and h-graph, triggering new nodes and edges in m-graph, with match-type info • Matching algorithm tied to rewrite-logic • Locally defined matches (no graph traversal) • Starting with (multiple) node matches • Edge matches: restricted to connect matched nodes text-hypothesis ==> text-hypothesis-match frame(h:x1,killing) frame(m:(z1,x1,y1), killing), match_type(m:(z1,x1,y1),killing,frame) ==> frame(t:y1,killing) Rewrite rule +frame(h:X1,Frame), +frame(t:Y1,Frame) ==> frame(m:(Z1,X1,Y1),Frame), match_type(m:(Z1,X1,Y1),Frame,frame).
Hypothesis-Text-Match GraphsComputing structural and semantic overlap Aspects of similarity • Syntax-based (i.e. lexical and structural) similarity • Identical PREDs and attribute values trigger node matches • Identical ATTRIBUTES (GF, morph. features) trigger edge matches • Semantics-based similarity • Identical FRAMES and CONCEPTS trigger node matches • Identical ROLES trigger edge matches • Match graph consists of identical partial syntactic & semantic graphs Degrees of similarity (strict vs. weak matching) • Non-identical, but “structurally related” PREDs • coreferentially related (relative clauses, appositives, pronominals) • Non-identical, but “semantically related” PREDs (WN-related, path<3) • Non-identical, but “semantically related” FRAMES (FN-/Detour-related) • Match graph establishes overlapping partial graphs (marked by match types)
Grammatically related h: Aki Kaurismäki directed a film. WordNet related t: In 1983, Aki Kaurismäki directed his first full-time feature.
Approximating Textual Entailment Extensions for deeper modelling: Modality Detecting indicators of inconsistent modality types • T: A pet musthave rabies protection confirmed by a blood test. H: A case of rabies was confirmed. Marking modal contexts in text and hypothesis • 5 modality types: conditional, future, diamond, box, negation Handling inconsistent modality types in matching process • Introducing negatively marked match nodes • Blocking embedded structures for similarity-based matches • Thus, reducing the size of the match graph
Approximating Textual Entailment Extensions for deeper modelling: Lexical Entailments Bridging partial non-matching text and hypothesis pairs • T: Olson, 62, previously worked as a partner at Ernst & Young LLP, as a Minnesota bank president and as a congressional aide, before joining the Fed board in 2001, to serve a term ending in 2010. H: Olsen is a member of the Fed board. Lexically induced inferences, defined as rewrite rules on h/t/m graphs Similar: non-lexical heuristic inferences • Appositions: prime minister XX is prime minister • Possessive constructions: X’s Y the Y of X t: (X1) joins X2 h: (Y1) member-of Y2 m:(Z2,Y2,X2) => match_type(heuristic_entailment_match).
preds_m_relto_h 0.485294 & frames_m_relto_h 0.954546 rte_entails = 0 Approximating Textual EntailmentMachine learning Feature selection with WEKA Classifiers • Many learners select intuitively important features, but also “idiosyncratic” ones Selected learners and models • Model 1Simple Conjunctive Rule classifier: generated a single rule Medium/high threshold on pred/frame matches as criterion for rejection High degree of frame similarity /w medium predicate similarity models entailment • Model 2 Meta-classifier LogitBoost (additive logistic regression) Features (1.-4.) used in iteration; final feature set: 1.,2.,4.
Results in RTE-II SALSA RTE system results • Both models score SUM > IR > QA > IE • Refined model better on QA – simple model better on SUM Overall RTE-II results • Average accuracy: 60% (Median: 59%) • Shallow overlap measures vary considerably between data sets, whereas “deeper” approaches remain more stable • Tendency towards deeper, knowledge-rich methods
Discussion of ResultsTrue positives High ratio of matching predicates, frames, and f-structure Typical phenomena • Non-identical predicates compensated by matching frames (626) • Missing frame assignments compensated by WN relatedness • die – pass away (wn-related, 103) • Active-passive diathesis resolved by f-structure normalisation (129) Relative overlap measures also work for longer hypotheses
Discussion of ResultsTrue negatives Modal context marking seems to be effective • 27% of all true negatives involved modality mismatches, while only 11.9% of all sentences involve marked modal contexts Future plans • Extend to lexically induced modality/facticity indicators • Testing for non-monotonicity contexts
Error analysisFalse positives Typical cases Semantic dissimilarity • Non-matching predicates within larger match graphs, which are in fact semantically dissimilar Structural distance • Matching nodes within a match graph correspond to far distant nodes in the text graph – compared to neighbouring nodes in the match graph
Error analysisFalse positives Unconnected nodes matched with distant nodes in text grap T:Some 420 people have been hanged in Singapore since 1991, mostly for drug trafficking, an Amnesty International 2004 report said. That gives the country of 4.4 million people the highest execution rate in the world relative to population. H:4.4 million people were executed in Singapore. (198) – False positive
text hypothesis Error analysisFalse positives Graph matching process • Not a top-down process • Starts by relating any nodes, and builds growing clusters by finding matching edges • This allows criss-cross matching of nodes in the match graph • Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance)
Error analysisFalse positives Graph matching process • Not a top-down process • Starts by relating any nodes, and builds growing clusters by finding matching edges • This allows criss-cross matching of nodes in the match graph text hypothesis • Introduce weighted edges that reflect the relative distance of pairs of match nodes in text and hypothesis (path distance)
Conclusions A medium-depth approach: Approximating Textual Entailment • Lexical and syntactic overlap, semantic similarity (WordNet) • Frame semantics: lexical semantic classes & argument structure • Flexible graph matching method with extensions to deeper processing • Modality contexts, lexical inferences Perspectives for future extensions • Engineering and fine-tuning • Combination with shallow (and deeper) methods in voting architecture • Frame and role assignment • Sense discrimination: outlier detection (Erk, 2006) • Coverage: integration with other resources (VerbNet, NomBank) • Modelling dissimilarity • Semantic distance measures and distance-weighted graph edges • Acquisition of lexical modality indicators and (lexical) entailment rules
References • RTE Proceedings • RTE Challenge Homepage: http://www.pascal-network.org/Challenges/RTE2 • I. Dagan, O. Glickman, and B. Magnini(2005): „The PASCAL recognising textual entailment challenge“. In Proceedings of the RTE-1 Workshop, Southampton, UK. • B. Magnini and I. Dagan, editors (2006). Proceedings of the Second PASCAL Recognising Textual Entailment Challenge, Venice, Italy. • Electronic proceedings and slides: http://ir-srv.cs.biu.ac.il:64080/RTE2/proceedings/ • Discussion about RTE Task: • Zaenen, Karttunen and Crouch, 2005: “Local Textual Inference: can it be defined or circumscribed?”, In ACL 2005 Workshop on Empirical Modelling of Semantic Equivalence and Entailment, Ann Arbor, Michigan. • Manning (2006): “Local Textual Inference: It's hard to circumscribe, but you know it when you see it - and NLP needs it”, MS. Stanford University. • Crouch, Karttunen and Zaenen (2006): “Circumscribing is not excluding: A reply to Manning”, MS. Palo Alto Research Center. • All papers: http://www2.parc.com/istl/members/zaenen/
References • A. Burchardt and A. Frank (2006): “Approximating Textual Entailment with LFG and FrameNet Frames” In Proceedings of the Second Recognising Textual Entailment Workshop, Venice, Italy.http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications • K. Erk and S. Pado (2006): “Shalmaneser - a flexible toolbox for semantic role assignment.” In Proceedings of LREC-06, Genoa.http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications • A. Burchardt, K. Erk, and A. Frank (2005): “A WordNet Detour to FrameNet.” In Proceedings of the GLDV 2005 Workshop GermaNet II, Bonn.http://www.coli.uni-saarland.de/projects/salsa/page.php?id=publications • R. Crouch (2005). “Packed Rewriting for Mapping Semantics to KR.” In Proceedings of the Sixth International Workshop on Computational Semantics, Tilburg.http://www2.parc.com/istl/groups/nltt/papers/iwcs05_crouch.pdf
Approximating Textual EntailmentSimilarity/Entailment measures and feature extraction
Error analysisSparse features Feature set • High-frequency features that measure similarity • Few, and low-frequency features that model dissimilarity • Bias towards similarity • 29,5% false positives • 12,75% false negatives Plans for further development • Introducing distance measures (semantic and structural) • Getting a grip on remaining differences, i.e. non-matched edges between matching clusters