1 / 14

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007. Thanks to… Main task organizing committee. UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo. What? Answer Validation Exercise. Validate the correctness of the answers…

carnig
Download Presentation

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answer Validation Exercise - AVEQA subtrack at Cross-Language Evaluation Forum 2007 Thanks to… Main task organizing committee UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo

  2. What? Answer Validation Exercise Validate the correctness of the answers… ... given by the participants at CLEF QA 2007

  3. Exact Answer QA system Supporting snippet & doc ID Into affirmative form Hypothesis Text AVE 2006: an RTE exercise If the text semantically entails the hypothesis, then the answer is expected to be correct. Question

  4. Answer Validation Question Automatic Hypothesis Generation Hypothesis Textual Entailment Question Candidate answer Question Answering Answer is correct Supporting Text Answer is not correct or not enough evidence AVE 2006 AVE 2007 Answer Validation Exercise Black box

  5. Answer Validation Exercise • AVE 2006  • Not possible to quantify the potential gain that AV modules give to QA systems • Change in AVE 2007 methodology • Group answers by question • Systems must validate all • But select one

  6. AVE 2007 Collections • <q id="116" lang="EN"> • <q_str>What is Zanussi?</q_str> • <a id="116_1" value=""> • <a_str>was an Italian producer of home appliances</a_str> • <t_str doc="Zanussi">Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought</t_str> • </a> • <a id="116_2" value=""> • <a_str>who had also been in Cassibile since August 31</a_str> • <t_str doc="en/p29/2998260.xml">Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August 31.</t_str> • </a> • <a id="116_4" value=""> • <a_str>3</a_str> • <t_str doc="1618911.xml">(1985) 3 Out of 5 Live (1985) What Is This?</t_str> • </a> • </q>

  7. Collections • Remove duplicated answers inside the same question group • Discard NIL answers, void answers and answers with too long supporting snippet • This processing lead to a reduction in the number of answers to be validated

  8. Collections (# answers to validate) Available for CLEF participants at nlp.uned.es/QA/ave/

  9. Evaluation • Not balanced collections • Approach: Detect if there is enough evidence to accept an answer • Measures: Precision, recall and F over ACCEPTED answers • Baseline system: Accept all answers

  10. Evaluation Precision, Recall and F measure over correct answers for English

  11. Comparing AV systems performance with QA systems (German)

  12. Techniques reported at AVE 2007 • 10 reports, all reported a RTE approach

  13. Conclusion • Evaluation in a real environment • Real systems outputs -> AVE input • Developed methodologies • Build collections from QA responses • Evaluate in chain with a QA Track • Compare results with QA systems • New testing collections for the QA and RTE communities • In 7 languages, not only English

  14. Conclusion • 9 groups, 16 systems, 4 languages • All systems based on Textual Entailment • 5 out of 9 groups participated in QA • Introduction of RTE techniques in QA • More NLP • More Machine Learning • Systems based on syntactic or semantic analysis perform Automatic Hypothesis Generation • Combination of the question and the answer • Some cases directly in a logic form

More Related