1 / 21

Recognizing Textual Entailment Challenge PASCAL

Recognizing Textual Entailment Challenge PASCAL. Suleiman BaniHani. Textual Entailment. Textual entailment recognition: is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text. Task Definition.

overton
Download Presentation

Recognizing Textual Entailment Challenge PASCAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Textual Entailment ChallengePASCAL Suleiman BaniHani

  2. Textual Entailment • Textual entailment recognition: is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text.

  3. Task Definition • Given pairs of small text snippets, referred to as Text-Hypothesis (T-H) pairs. Build a system that will decide for each T-H pair whether T indeed entails H or not. Results will be compared to the manual gold standard generated by annotators. • Example: • T: Kurdistan Regional Government Prime Minister Dr. Barham Salih was unharmed after an assassination attempt. • Prime minister targeted for assassination

  4. Dataset Collection and Application Settings • The dataset of Text-Hypothesis pairs was collected by human annotators. It consists of seven subsets • Information Retrieval (IR) • Comparable Documents (CD) • Reading Comprehension (RC) • Question Answering (QA) • Information Extraction (IE) • Machine Translation (MT) • Paraphrase Acquisition (PP)

  5. Approaching textual entailment recognition • Solution approaches can be categorizes as. • Deep analysis or “understanding” • Using types of linguistic knowledge and resources to accurately recognize textual entailment • Patterns of entailment (e.g. lexical relations, syntactic alternations) • Processing technology (word co-occurrence statistics, thesaurus, parsing, etc.) • Shallow approach

  6. Baseline • Given that half the pairs are FALSE, the simplest baseline is to label all pairs FALSE. This will achieve 50% accuracy.

  7. Application of the BLEU (BiLingual Evaluation Understudy) algorithm • Shallow based on lexical level. • It is based on calculating the percentage of n-grams for a given translation to the human standard one, a typical values for N are taken, i.e. 1, 2, 3, 4. • It limits each n-gram appearance to a maximum frequency. • The result of each n-gram is combined, and a penalty is added to short text. • Scored 54% for development set, and a 50% in the test set. • Good results in the CD, bad results in IE and IR. • Problem, does not recognize syntactical or semantics, such as synonyms and antonyms.

  8. Syntactic similarities • Human annotators were asked to divide the data set to • True by syntax • False by syntax • Not syntax • Cannot decide • Then using a robust parser to establish the results. • A partial submission was provided. And humans were used for the test.

  9. Tree edit distance • The text as well as the Hypothesis is transformed to a tree using a sentence splitter and a parser to create the syntactic representation. • A matching module, find the beast sequence of editing operations to obtain H from T. • Each editing operation (deletion, insertion and substitution) is given a relative score. • Finally the total score is evaluated, if it exceeds a certain limit them the pair is labeled as true. • High accuracy for CD but 55% overall accuracy. • Should be enriched by using resources as WordNet and other libraries.

  10. Dependency Analysis and WordNet • A dependency parser is used to normalized data in appropriate tree representation. • Then a lexical entailment module is used, where the sub branches of T an H can be entailed from the other using • Synonymy and similarity • Hyponymy and WordNet entailment, i.e. death entail kill. • Multiwords, i.e. melanoma entails skin-cancer. • Negation and antonymy, where negation is propagated through tree leaves. • A matching between dependency trees using a matching algorithm searching for matching branches between T and H. • Results show high score in CD and a between 42 to 55 % in other fields

  11. Syntactic Graph Distance: a rule based and a SVM based approach • Use a graph distance theory, where a graph is used to represent the H and T pair. • Use similarity measures to determine entailment • T semantically subsumes H, e.g. H: [The cat eats the mouse] and T: [the cat devours the mouse], eat generalizes devour). • T syntactically subsumes H, e.g., H: [The cat eats the mouse] and T: [the cat eats the mouse in the garden], T contains a specializing prepositional phrase). • T directly implies H (e.g., H: [The cat killed the mouse], T: [the cat devours the mouse]).

  12. Cont. • A rule based system realize the following • Node similarity • Syntactic similarity • Semantic similarity • Applying a machine learning technique to evaluate the parameters and make the final decision • Results high for CD .76 and .44-.59 for others

  13. hierarchical knowledge representation • A hierarchical logic passed representation o the T H pairs, where a description logic inspired language is used, extended feature description login (EFDL) which is similar to concept graph. • Nodes in the graph represent words or phrases. • Manually generated rewriting rules are used for semantic and syntactic representations. • A sentence in the text can have different alternatives • The evaluation is based if any of the sentence representations can infer H. • Results in the system set 64.8 while in the test 56.1, high CD lowest QA 50%

  14. Logic like formula representation • A parser is used to transfer the pair T and H to graph, of logical phrases, where the nodes are the words and the links are the relations. • A matching score is given for each pair of terms. • The theorem proof is used to find the proof with the lowest coast. • The final cost is evaluated is it is less than a threshold, then the entailment is proved. • High results in the CD 79%, Lowest with MT 47% average 55%.

  15. Atomic Propositions • Find entailment relation by comparing the atomic proposition contained in the T and H. • The comparison of the atomic propositions is done using a deduction system OTTER. • The atomic propositions are extracted from the text using a parser. • WordNet is used for word relations. • A semantic analyzer is used to transform the output of the parser to first order logic. • Low accuracy .5188 especially for QA 47%.

  16. Combining shallow over lapping technique with deep theorem proving • In the shallow stage a simple frequency test of over lapping words is used. • In the deep stage CCG–parser is used to generate DRS, discourse representation theory. Which is transformed to first order logic. • Vampire theorem prover and Paradox where used for entailment proof. • A knowledge base was used to validate results with real world. • WordNet • Geographical knowledge from CIA • Generic axioms for, for instance, the semantics of possessives, active-passives, and locations. • The combined system has accuracy of .562 while the shallow approach has an accuracy of 0.55.

  17. Applying COGEX logic prover • First use parser to convert into logic. • Then use COGEX, which is a modified version of OTTER. • The prover requires a set of clauses called the “set of support” which is used to initiate the search for inferences. • The set of support is loaded with the negated form of the hypothesis as well as the predicates that make up the text passage. • Another list is required called the usable list, contains clauses used by OTTER to generate inferences. • The usable list consists of all the axioms that have been generated either automatically or by hand. • World Knowledge Axioms (Manually) • NLP Axioms(SS and SM) • WordNet Lexical Chains • Overall accuracy .551, a lot of errors in the parsing stage.

  18. Comparing task accuracy CD – Comparable Documents IE – Information Extraction QA – Question Answering IR – Information Retrieval MT – Machine Translation RC – Reading Comprehension PP – Paraphrasing

  19. Rsullt comparison

  20. Future work • Search for a candidate parser to transform NL to first order logic. • Use the largest set of KB to caputre similarity. • Search for a robust theorem prover.

  21. References • The first PASCAL Recognising Textual Entailment Challenge (RTE I) • Ido Dagan, Oren Glickman and Bernardo Magnini. The PASCAL Recognising Textual Entailment Challenge.In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment, 2005. • http://www.cs.biu.ac.il/~glikmao/rte05/index.html

More Related