1 / 25

Learning with Latent Alignment Structures

Explore how to learn syntactic and semantic relations between text pieces for tasks such as Question Answering and Textual Entailment. Address challenges with modeling syntax and semantics, propose an ideal model approach, and present two models with experiments, discussing their strengths and future work.

Download Presentation

Learning with Latent Alignment Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment Mengqiu Wang Joint work with Chris Manning, Noah Smith

  2. Task definition • At a high-level: • Learning the syntactic and semanticrelations between two pieces of text • Application-specific definition of the relations • Question Answering Q: Who is the leader of France? A: Bush later met with French President Jacques Chirac • Machine Translation C: 温总理昨天会见了日本首相安培晋三。 E: Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday. • Summarization T: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq. S: US rounded up 400 people in Iraq. • Textual Entailment (IE, IR, QA, SUM) Txt:Responding to Scheuer's comments in La Repubblica, the prime minister's office said the analysts' allegations, "beyond being false, are also absolutely incompatible with the contents of the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler." Hyp:Mel Sembler represents the U.S.

  3. The Challenges • Latent alignment structure • QA: Who is the leader of France? Bush later met with French President Jacques Chirac • MT: 温总理昨天会见了日本首相安培晋三。 Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday. • Sum: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq. US rounded up 400 people in Iraq. • RTE: Responding to … the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler.“ Mel Sembler represents the U.S.

  4. Other modeling challenges 1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, … 3. … Who is the leader of France? Question Answer Ranking

  5. Semantic Tranformations • Q:“Who is the leader of France?” • A: Bush later met with Frenchpresident Jacques Chirac.

  6. Syntactic Transformations Who mod mod is the leader of France ? mod Bush met with French president Jacques Chirac

  7. Syntactic Variations Who mod mod is the leader of France ? mod mod Henri Hadjenberb , who is the leader of France ’s Jewish community

  8. What’s been done? • The latent alignment problem • Instead of treating alignment as latent variable, treat it as a separate task. First find the best alignment, then proceed with the rest of the task • Pros: Usually simple and efficient. • Cons: Not very robust, no way to correct alignment errors in later steps. • Modeling syntax and semantics • Extract features from syntactic parse trees and semantic resources then throw them into a linear classifier. Use syntax and semantic to enrich the feature space, but no principled ways to make use of syntax • Pros: No need to worry about trees too much • Cons: Ad-hocs

  9. What I think an ideal model should do • Carry alignment uncertainty into final task • Treat alignment as latent variables and jointly learn about proper alignment structure and the overall task • In other words, model the distribution over alignments and sum out all possible alignments at decoding time. • Syntax-based and feature-rich models • Directly model syntax • Enable the use of rich semantic features and features from other world-knowledge resources.

  10. Road map • Present two models that address the raised issues • 1: A model based on Quasi-synchronous Grammar (EMNLP 07’) • Experiments on Question Answering task • 2: A tree-edit CRFs model (current work) • Experiments on RTE • Discuss and compare these two models • Modeling power • Pros and cons • Future work

  11. Switching gear… • Quasi-synchronous Grammar for Question Answering

  12. Tree-edit CRFs for RTE • Extension to McCallum et al. UAI2005 work on CRFs for finite-state String Edit Distance • Key attractions: • Models the transformation of dependency parse trees (thus directly models syntax), unlike McCallum et al. ’05, which only models word strings • Discriminatively trained (not a generative model, unlike QG) • Trained on both the positive and negative instances of sentence pairs (QG is only trained on positive Q/A pairs) • CRFs – the underlying graphical model is an undirected graphical model (QG is basically a Bayes Net, directed) • Joint model over alignments (vs. local alignment models in QG) • Feature rich

  13. TE-CRFs model in details • First of all, let’s look at the correspondence between alignment (with constraints) and edit operations

  14. $ root $ root Q: A: substitute root root met VBD is VB substitute subj obj subj with who WP qword leader NN Bush NNP person Jacques Chirac NNP person insert det of Fancy substitute nmod the DT France NNP location president NN substitute delete nmod French JJ location substitute

  15. S2 S2 S3 S2 S1 S3 S2 S3 S1 S2 S3 S1 S3 S1 S1 S1 S2 S3 S1 S2 S1 TE-CRFs model in details • Each valid tree edit operation sequence that transforms one tree into the other corresponds to an alignment. A tree edit operation sequence is models as a transition sequence among a set of states in a FSM D, S, I D, S, I D, S, I S1 S2 D, E, I D, S, I S3 D, S, I D, S, I substitute insert substitute delete substitute substitute … … … … … … …

  16. … … … … … … … … … … … … … … … … … … … … S2 S3 S2 S2 S2 S2 S3 S3 S3 S1 S3 S2 S1 S1 S3 S2 S1 S1 S1 S1 S1 S2 S1 S3 S1 S3 S1 S1 S1 S1 S2 S1 S1 S3 S1 S1 S1 S1 S1 S1 S1 S2 FSM substitute insert substitute delete substitute substitute … … … … … … … This is for one edit operation sequence substitute insert delete substitute substitute substitute insert substitute substitute delete substitute substitute substitute insert substitute substitute delete substitute There are many other valid edit sequences

  17. D, S, I D, S, I D, S, I D, S, I D, S, I D, S, I S1 S1 S2 S2 D, S, I D, S, I D, S, I D, S, I S3 S3 D, S, I D, S, I D, S, I D, S, I FSM cont. ε ε Positive State Set Start Stop ε ε Negative State Set

  18. FSM transitions Positive State Set … S1 S1 S2 S3 S2 S3 … S3 S3 S2 S3 S1 S2 … … … … … … … S1 S1 S2 S2 S2 … S2 S2 S3 S3 S2 … S1 S3 Stop Start Negative State Set … S1 S1 S2 S3 S2 S3 … S3 S3 S2 S3 S1 S2 … … … … … … … S1 S1 S2 S2 S2 … S2 S2 S3 S3 S2 … S1 S3

  19. Parameterization substitute S2 S1 positive or negative positive and negative

  20. Training using EM Jensen’s Inequality E-step M-step Using L-BFGS

  21. Features for RTE • Substitution • Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/Adv/Other • Sub/MisSub -- Punct/Stopword/ModalWord • Antonym/Hypernym/Synonym/Nombank/Country • Different – NE/Pos • Unrelated words • Delete • Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Conditional/If • Insert • Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Conditional/If • Tree • RootAligned/RootAlignedSameWord • Parent,Child,DepRel triple match/mismatch • Date/Time/Numerical • DateMismatch, hasNumDetMismatch, normalizedFormMismatch

  22. Tree-edit CRFs for Textual Entailment • Preliminary results • Trained on RTE2 dev, tested on RTE2 test. • model taken after 50 EM iterations • acc:0.6275, map:0.6407. • RTE2 official results • Hickl (LCC) acc:0.7538, map:0.8082 • Tatu (LCC) acc:0.7375, map:0.7133 • Zanzotto (Milan & Rome) acc:0.6388, map:0.6441 • Adams (Dallas) acc:0.6262, map:0.6282

  23. Generative Directed, BayesNet, local Allow arbitrary swapping in alignment Allow limited use of semantic features (lexical-semantic log-linear model in mixture model) Computationally cheaper Discriminative Undirected, CRFs, global No swapping – can’t do substitutions that involve swapping (can be extended, see future work) Allow arbitrary semantic features Computationally more expensive Comparison: QG vs. TE-CRFs QG TE-CRFs

  24. Generative Train discriminatively using Noah’s Contrastive Estimation Directed, BayesNet, local Higher-order Markovization Allow arbitrary swapping in alignment Allow limited use of semantic features (lexical-semantic log-linear model in mixture model) Computationally cheaper Run RTE experiments Discriminative Undirected, CRFs, global No swapping Constrained unordered trees Fancy edit operations (e.g. substitute sub-trees) Allow arbitrary semantic features More expensive Run QA and MT alignment experiments Future work QG TE-CRFs

  25. Thank you! Questions?

More Related