1 / 33

Matching sets of parse trees for answering multi-sentence questions

Matching sets of parse trees for answering multi-sentence questions. Boris A.Galitsky 1 , Dmitry Ilvovsky 2 , Sergei O. Kuznetsov 2 and Fedor Strok 2 1 Knowledge Trail Inc. San Jose CA USA 2 Higher School of Economics, Moscow Russia bgalitsky@hotmail.com ; dilv_ru@yahoo.com ;

annamoreno
Download Presentation

Matching sets of parse trees for answering multi-sentence questions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matching sets of parse trees for answering multi-sentence questions Boris A.Galitsky1, Dmitry Ilvovsky2, Sergei O. Kuznetsov2 and Fedor Strok2 1 Knowledge Trail Inc. San Jose CA USA 2 Higher School of Economics, Moscow Russia bgalitsky@hotmail.com; dilv_ru@yahoo.com; skuznetsov@hse.rufdr.strok@gmail.com

  2. Outline • Motivations • Search application • Introducing Parse Thicket • Generalizing Parse Thickets • Parse Thicket for multi-sentence search • Evaluation • Conclusions

  3. Motivations • Parse trees are well-studied. What about using them for paragraphs? • We are a bit mad about structures. Could we produce some for measuring similarity purposes? • Search engineers do not want to learn NLP. May we somehow help them?

  4. Applications It sounds unreal but short texts are… useful • Search and Q/A • Content analysis: • Classification • Categorization • Content generation • Recommendations • Advertisement

  5. Answer in multiple sentences? • No answer includes ‘pulmonologist’

  6. Matching keywords. Bad

  7. “String phrases”. A little better

  8. What to do? Structures will help us!

  9. Similarity between question and answer • Baseline: bag-of-words approach, which computes the set of common keywords/n-grams and their frequencies. • Pair-wise sentence matching: syntactic generalizationfor each pair of sentences and summation of the resultant similarities [Galitsky et al., 2012] • Paragraph-paragraph matching. Similarity as a generalization operation

  10. Finding similarity. Small example "Iran refuses to accept the UN proposal to end the dispute over work on nuclear weapons", "UN nuclear watchdog passes a resolution condemning Iran for developing a second uranium enrichment site in secret", "A recent IAEA report presented diagrams that suggested Iran was secretly working on nuclear weapons", "Iran envoy says its nuclear development is for peaceful purpose, and the material evidence against it has been fabricated by the US", ^ "UN passes a resolution condemning the work of Iran on nuclear weapons, in spite of Iran claims that its nuclear research is for peaceful purpose", "Envoy of Iran to IAEA proceeds with the dispute over its nuclear program and develops an enrichment site in secret", "Iran confirms that the evidence of its nuclear weapons program is fabricated by the US and proceeds with the second uranium enrichment site"

  11. Keywords: topic with no details Iran, UN, proposal, dispute, nuclear, weapons, passes, resolution, developing, enrichment, site, secret, condemning, second, uranium

  12. Improvement: pair-wise sentence generalization [NN-work IN-* IN-on JJ-nuclear NNS-weapons ], [DT-the NN-dispute IN-over JJ-nuclear NNS-* ], [VBZ-passes DT-a NN-resolution ], [VBG-condemning NNP-iran IN-* ], [VBG-developing DT-* NN-enrichment NN-site IN-in NN-secret ], [DT-* JJ-second NN-uranium NN-enrichment NN-site ]], [VBZ-is IN-for JJ-peaceful NN-purpose ], [DT-the NN-evidence IN-* PRP-it ], [VBN-* VBN-fabricated IN-by DT-the NNP-us ]

  13. Paragraph-paragraph generalization [NN-Iran VBG-developing DT-* NN-enrichment NN-site IN-in NN-secret ] [NN-generalization-<UN/nuclear watchdog> * VB-pass NN-resolution VBG condemning NN- Iran] [NN-generalization-<Iran/envoy of Iran> Communicative_action DT-the NN-dispute IN-over JJ-nuclear NNS-*] [Communicative_action - NN-work IN-of NN-Iran IN-on JJ-nuclear NNS-weapons] [NN-generalization<Iran/envoy to UN> Communicative_action NN-Iran NN-nuclear NN-* VBZ-is IN-for JJ-peaceful NN-purpose] Communicative_action - NN-generalize <work/develop> IN-of NN-Iran IN-on JJ-nuclear NNS-weapons] [NN-generalization<Iran/envoy to UN> Communicative_action NN-evidence IN-against NN Iran NN-nuclear VBN-fabricated IN-by DT-the NNP-us ] condemn^proceed[enrichment site] <leads to> suggest^condemn[ work Iran nuclear weapon]

  14. Introducing Parse Thicket • Representation of a linguistic structure of a paragraph of text. • Syntactic information + discourse. • Graph structure: parse trees + additional arcs for inter-sentence relationship between parse tree nodes for words.

  15. Semantic relations in Parse Thicket • Taxonomy, coreferences • Anaphora • Same entity • Hyponym/Hyperonym • Siblings • etc • Rhetorical Structure Theory (RST) [Mann] • Speech Act Theory (SpAct) [Searle]

  16. Why Parse Thickets? • Least general generalization in terms of structural representations of text paragraphs • Similarity between two texts as a generalization of their PT. • Exploring machine learning on structures [Moschitti, Sun] at the level of paragraphs

  17. Rhetorical Structure Theory • Structure of text in terms of relations that hold between parts of text • Text patterns such as nucleus/satellite structure • Relations between clauses in text which might not be syntactically linked.

  18. SpeechAct Theory • Indicates a structure of a dialogue • Includes vocabulary of communicative actions (verbs): • Suggest • Condemn • Dispute • Convey • etc

  19. Generalization of 2 PT. Graphs • PT -> graph • Generalization is a set of all maximal common subgraphs • Costs a lot: • NP-complete • But could be improved using specific of a PT [still in work] • But explored anyhow [Galitsky et al., GKR 2013]

  20. Generalization of 2 PT. Phrases • PT -> set of phrases • Regular phrases (NP, VP, etc) • Thicket phrases (trees in a graph including coreferential and taxonomical arcs) • RST-phrases • CA-phrases • Pair-wise generalization of phrases • Works fast: appr. constant time for modified inverse index

  21. Generalization of 2 PT: CA example condemn^proceed [enrichment site] <leads to> suggest^condemn [ work Iran nuclear weapon ]

  22. Generalization of 2 PT: RST example Iran nuclear NN – RST-evidence – fabricated by USA

  23. PT for search • Top-N of initial search results • Question -> PT, Candidate answer -> PT • Pair-wise generalization between Q and A-s • For each pair: • calculating score for each sub-phraze in a generalization • taking maximal score • Re-ranking of search results

  24. Evaluation domain • Product recommendation. Reading chats about products and finding relevant information on the web about a particular product. • Travel recommendation. Reading chats about travel and finding relevant information on the travel websites. • Facebook recommendation. Reading wall postings and chats and finding a piece of relevant information for friendson the web.

  25. Evaluation results

  26. Search relevance improvement • Unfiltered precision is 58.2%, • Improvement by pair-wise sentence generalization is 6,5%, • PT on phrases for snippets – additional 4%, • PT on phrases for original sentences – additional 1.5%.

  27. Projectcontribution Syntactic generalization component: https://svn.apache.org/repos/asf/incubator/opennlp/sandbox/opennlp-similarity Full project: https://code.google.com/p/relevance-based-on-parse-trees/

  28. This project among open source projects Parse Thickets for search StanfordNLP OpenNLP SOLR

  29. Paragraphs generalization. Ready to be plugged-in SentencePairMatchResult matchRes = sm.assessRelevance(snapshot, searchQuery); List<List<ParseTreeChunk>> match = matchRes.getMatchResult(); score = parseTreeChunkListScorer. getParseTreeChunkListScore(match); if (score > 1.5) { // relevant }

  30. Conclusions • Parse Thicket – structural representation of a text paragraph as a whole. • Search relevance improvement: keywords -> parse trees -> parse thickets • Framework is ready… and is also improving permanently

  31. Thanks!

More Related