1 / 26

Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg

Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies. Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg SRI International. Motivation.

derek-finch
Download Presentation

Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg SRI International

  2. Motivation • Problem: identification of agreements and disagreements between participants in meetings. • Ultimate goal: automatic summarization. This enables us to generate “minutes” of meetings highlighting the debate that affected each decision.

  3. Example 4-way classification: AGREE,DISAGREE, BACKCHANNEL, OTHER

  4. Previous work • Decision-tree classifiers [Hillard et al. 03] • CART-style tree learner. • Features local to the utterance: lexical, durational, and acoustic. • Reasonably good accuracy in a 3-way classification (AGREE,DISAGREE, OTHER): • 71% with ASR output; • 82% with accurate transcription.

  5. Extend [Hillard et al. 03] by investigating the effect of context • Empirical questions: • Are preceding agreements/disagreements good predictors for the classification task? • Does the current label (agreement/disagreement) depend on the identity of the addressee? • Should we distinguish preceding labels by the identity of their corresponding addresses? • Studies we report on show that preceding context supplies good predictors. • Addressee identification is instrumental to analyzing preceding context.

  6. Agreement/disagreement classification in two steps • Addressee identification • Large corpus of labeled adjacency pairs (AP), labeled paired utterances A and B • e.g. question-answer, offer-acceptance, apology-downplay • Train a system to determine who is the addressee (A-part) of any given utterance (B-part) in a meeting. • Agreement/disagreement classification • Features local to the utterance and pertaining to immediately preceding speech and silences. • Label-dependency features: dependencies between current label (agree, disagree, …) and previous labels in a Bayesian network. • Addressee identification defines the topology of the Bayesian network.

  7. Corpus annotation • ICSI meeting corpus: 75 informal meetings recorder at UC Berkeley, averaging one hour, and ranging from 3 to 9 participants. • Adjacency pair annotation:[Dhillon et al. 04] • All 75 meetings labeled with dialog acts and adjacency pairs. • Agreement/disagreement annotation:[Hillard et al. 03] • Annotation of 4 meeting segments plus tags for 4 additional meetings obtained with a clustering method [Hillard et al. 03] • 8135 labeled utterances: 11.9% agreements 6.8% disagreements 23.2% backchannels 58.1% other • Inter-labeler reliability: kappa coefficient of .63

  8. Step 1: Addressee (AP) identification • Baseline algorithm: • always assume that the addressee in an adjacency pair (A,B) is the party who spoke last before B. • Works reasonably well: 79.8% accuracy. • Our method: speaker ranking • rank all speakers S = (s1,…,sN) with probabilities reflecting how likely they are to be speaker A (i.e. the addressee). • Log-linear (maximum entropy) probability model for ranking: • di inD = (d1,…,dN) are observations pertaining to speaker siand to the last utterance of speakersi

  9. Features for AP identification • Structural: • number of speakers taking the floor between A and B we match the baseline with this single feature (79.8%) • Durational features: • duration of A short utterances generally do not elicit responses/reactions • seconds of overlap with any other speaker competitive speech incompatible with AP construction • Lexical features: • number of n-grams both in A and B (uni- to trigrams) A and B parts often have some words in common • first word of A to exploit cue words, detect wh- questions • Is the B speaker (addressee) named explicitly in A?

  10. Adjacency pairs identification: results • Experimental setting: 40 meetings used for training (9104 APs), 10 meetings used for testing (1723 APs) 5 meetings of an held-out set used for forward feature selection and regularization (Gaussian smoothing)

  11. Step 2: Agreement/disagreement classification:local features of the utterance • Local features of the utterance include the ones used in [Hillard et al. 03] (but no acoustics). Best predictors: Lexical features: • agreement and disagreement markers [Cohen, 02], adjectives with positive/negative polarity [Hatzivassiloglou and McKeown, 97], general cue phrases [Hirschberg and Litman, 94]. • first word of the utterance • score according to four LM (one for each class). Structural and durational features: • duration of the utterance • speech rate

  12. Label dependencies in sequence classification • Previous-tag feature p(ci|ci-1) helpful in many NLP application to model context: POS tagging, supertagging, dialog act classification. • Various families of Markov models to train (e.g. HMMs, CMMs, CRFs). • Limitations of fixed-order Markov models for representing of multi-party conversations: • overlapping speech; no strict label ordering • multiple speakers, with different opinions: previous tag (speaker A) might affect current tag (speaker B addressing A), but unlikely if B addresses C.

  13. Label dependency: previous-tag • Intuition: previous tag affects current tag A speaking to B tag index (BACKCHANNEL tags ignored for better interpretability)

  14. Label dependency: same-interactants previous tags • Intuition: If A disagreed with B(when A last spoke to B), then A is likely to disagree with B again.

  15. Label dependency: symmetry • Intuition: If B disagreed with A(when B last spoke to A), then A is likely to disagree with B.

  16. Label dependency: transitivity • Intuition: If Adisagrees with C after C agreed with B, then we might expect A to disagree with B as well.

  17. Parameter estimation • We use (dynamic) Bayes nets to factor the conditional probability distribution: C = (c1,…,cL) : sequence of labels D = (d1,…,dL) : sequence of observations pa(ci) : parents of ci, i.e. label dependencies as in: • (Maximum entropy) log-linear model used to estimate the probability of the dynamic variable ci:

  18. Decoding of the maximizing sequence • Beam search • Maintain a beam of B most likely left-to-right partial sequences (as in [Ratnaparkhi 96] for POS tagging). • In theory, possible search errors. • Practically, our search is seldom affected by beam size if isn’t too small: B=100 is a reasonable value for any sequence.

  19. Results: comparison to previous work • 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized. • Best performing feature set represents a 27.3% error reduction over [Hillard et al, 03].

  20. Results: comparison to previous work • 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized. • Label dependency features reduce error by 9%.

  21. Results: 4-way classification • 6-fold cross-validation, each fold on one meeting, representing a total of 8135 utterances to classify. • Label dependencies contribution on different feature sets:

  22. Results: 4-way classification • Accuracies by label dependency type (assuming all other features – structural, durational, lexical - are used):

  23. Conclusion and future work • Conclusion: • Performed addressee identification as a byproduct of agreement/disagreement classification. • AP identification: significantly outperform a competitive baseline. • Compelling evidence that models that incorporate label dependency features are superior. • Future work: • Summarization: identification of what propositional content was agreed or disagreed. • Addressee identification may also be beneficial in DA labeling of multi-party speech.

  24. Thank you

  25. Preceding-tags dependencies

  26. Preceding-tag dependency: transitivity

More Related