210 likes | 407 Views
Recognizing Textual Entailment Challenge PASCAL. Suleiman BaniHani. Textual Entailment. Textual entailment recognition: is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text. Task Definition.
E N D
Recognizing Textual Entailment ChallengePASCAL Suleiman BaniHani
Textual Entailment • Textual entailment recognition: is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text.
Task Definition • Given pairs of small text snippets, referred to as Text-Hypothesis (T-H) pairs. Build a system that will decide for each T-H pair whether T indeed entails H or not. Results will be compared to the manual gold standard generated by annotators. • Example: • T: Kurdistan Regional Government Prime Minister Dr. Barham Salih was unharmed after an assassination attempt. • Prime minister targeted for assassination
Dataset Collection and Application Settings • The dataset of Text-Hypothesis pairs was collected by human annotators. It consists of seven subsets • Information Retrieval (IR) • Comparable Documents (CD) • Reading Comprehension (RC) • Question Answering (QA) • Information Extraction (IE) • Machine Translation (MT) • Paraphrase Acquisition (PP)
Approaching textual entailment recognition • Solution approaches can be categorizes as. • Deep analysis or “understanding” • Using types of linguistic knowledge and resources to accurately recognize textual entailment • Patterns of entailment (e.g. lexical relations, syntactic alternations) • Processing technology (word co-occurrence statistics, thesaurus, parsing, etc.) • Shallow approach
Baseline • Given that half the pairs are FALSE, the simplest baseline is to label all pairs FALSE. This will achieve 50% accuracy.
Application of the BLEU (BiLingual Evaluation Understudy) algorithm • Shallow based on lexical level. • It is based on calculating the percentage of n-grams for a given translation to the human standard one, a typical values for N are taken, i.e. 1, 2, 3, 4. • It limits each n-gram appearance to a maximum frequency. • The result of each n-gram is combined, and a penalty is added to short text. • Scored 54% for development set, and a 50% in the test set. • Good results in the CD, bad results in IE and IR. • Problem, does not recognize syntactical or semantics, such as synonyms and antonyms.
Syntactic similarities • Human annotators were asked to divide the data set to • True by syntax • False by syntax • Not syntax • Cannot decide • Then using a robust parser to establish the results. • A partial submission was provided. And humans were used for the test.
Tree edit distance • The text as well as the Hypothesis is transformed to a tree using a sentence splitter and a parser to create the syntactic representation. • A matching module, find the beast sequence of editing operations to obtain H from T. • Each editing operation (deletion, insertion and substitution) is given a relative score. • Finally the total score is evaluated, if it exceeds a certain limit them the pair is labeled as true. • High accuracy for CD but 55% overall accuracy. • Should be enriched by using resources as WordNet and other libraries.
Dependency Analysis and WordNet • A dependency parser is used to normalized data in appropriate tree representation. • Then a lexical entailment module is used, where the sub branches of T an H can be entailed from the other using • Synonymy and similarity • Hyponymy and WordNet entailment, i.e. death entail kill. • Multiwords, i.e. melanoma entails skin-cancer. • Negation and antonymy, where negation is propagated through tree leaves. • A matching between dependency trees using a matching algorithm searching for matching branches between T and H. • Results show high score in CD and a between 42 to 55 % in other fields
Syntactic Graph Distance: a rule based and a SVM based approach • Use a graph distance theory, where a graph is used to represent the H and T pair. • Use similarity measures to determine entailment • T semantically subsumes H, e.g. H: [The cat eats the mouse] and T: [the cat devours the mouse], eat generalizes devour). • T syntactically subsumes H, e.g., H: [The cat eats the mouse] and T: [the cat eats the mouse in the garden], T contains a specializing prepositional phrase). • T directly implies H (e.g., H: [The cat killed the mouse], T: [the cat devours the mouse]).
Cont. • A rule based system realize the following • Node similarity • Syntactic similarity • Semantic similarity • Applying a machine learning technique to evaluate the parameters and make the final decision • Results high for CD .76 and .44-.59 for others
hierarchical knowledge representation • A hierarchical logic passed representation o the T H pairs, where a description logic inspired language is used, extended feature description login (EFDL) which is similar to concept graph. • Nodes in the graph represent words or phrases. • Manually generated rewriting rules are used for semantic and syntactic representations. • A sentence in the text can have different alternatives • The evaluation is based if any of the sentence representations can infer H. • Results in the system set 64.8 while in the test 56.1, high CD lowest QA 50%
Logic like formula representation • A parser is used to transfer the pair T and H to graph, of logical phrases, where the nodes are the words and the links are the relations. • A matching score is given for each pair of terms. • The theorem proof is used to find the proof with the lowest coast. • The final cost is evaluated is it is less than a threshold, then the entailment is proved. • High results in the CD 79%, Lowest with MT 47% average 55%.
Atomic Propositions • Find entailment relation by comparing the atomic proposition contained in the T and H. • The comparison of the atomic propositions is done using a deduction system OTTER. • The atomic propositions are extracted from the text using a parser. • WordNet is used for word relations. • A semantic analyzer is used to transform the output of the parser to first order logic. • Low accuracy .5188 especially for QA 47%.
Combining shallow over lapping technique with deep theorem proving • In the shallow stage a simple frequency test of over lapping words is used. • In the deep stage CCG–parser is used to generate DRS, discourse representation theory. Which is transformed to first order logic. • Vampire theorem prover and Paradox where used for entailment proof. • A knowledge base was used to validate results with real world. • WordNet • Geographical knowledge from CIA • Generic axioms for, for instance, the semantics of possessives, active-passives, and locations. • The combined system has accuracy of .562 while the shallow approach has an accuracy of 0.55.
Applying COGEX logic prover • First use parser to convert into logic. • Then use COGEX, which is a modified version of OTTER. • The prover requires a set of clauses called the “set of support” which is used to initiate the search for inferences. • The set of support is loaded with the negated form of the hypothesis as well as the predicates that make up the text passage. • Another list is required called the usable list, contains clauses used by OTTER to generate inferences. • The usable list consists of all the axioms that have been generated either automatically or by hand. • World Knowledge Axioms (Manually) • NLP Axioms(SS and SM) • WordNet Lexical Chains • Overall accuracy .551, a lot of errors in the parsing stage.
Comparing task accuracy CD – Comparable Documents IE – Information Extraction QA – Question Answering IR – Information Retrieval MT – Machine Translation RC – Reading Comprehension PP – Paraphrasing
Future work • Search for a candidate parser to transform NL to first order logic. • Use the largest set of KB to caputre similarity. • Search for a robust theorem prover.
References • The first PASCAL Recognising Textual Entailment Challenge (RTE I) • Ido Dagan, Oren Glickman and Bernardo Magnini. The PASCAL Recognising Textual Entailment Challenge.In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment, 2005. • http://www.cs.biu.ac.il/~glikmao/rte05/index.html