Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg

Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg SRI International

Motivation • Problem: identification of agreements and disagreements between participants in meetings. • Ultimate goal: automatic summarization. This enables us to generate “minutes” of meetings highlighting the debate that affected each decision.

Example 4-way classification: AGREE,DISAGREE, BACKCHANNEL, OTHER

Previous work • Decision-tree classifiers [Hillard et al. 03] • CART-style tree learner. • Features local to the utterance: lexical, durational, and acoustic. • Reasonably good accuracy in a 3-way classification (AGREE,DISAGREE, OTHER): • 71% with ASR output; • 82% with accurate transcription.

Extend [Hillard et al. 03] by investigating the effect of context • Empirical questions: • Are preceding agreements/disagreements good predictors for the classification task? • Does the current label (agreement/disagreement) depend on the identity of the addressee? • Should we distinguish preceding labels by the identity of their corresponding addresses? • Studies we report on show that preceding context supplies good predictors. • Addressee identification is instrumental to analyzing preceding context.

Agreement/disagreement classification in two steps • Addressee identification • Large corpus of labeled adjacency pairs (AP), labeled paired utterances A and B • e.g. question-answer, offer-acceptance, apology-downplay • Train a system to determine who is the addressee (A-part) of any given utterance (B-part) in a meeting. • Agreement/disagreement classification • Features local to the utterance and pertaining to immediately preceding speech and silences. • Label-dependency features: dependencies between current label (agree, disagree, …) and previous labels in a Bayesian network. • Addressee identification defines the topology of the Bayesian network.

Corpus annotation • ICSI meeting corpus: 75 informal meetings recorder at UC Berkeley, averaging one hour, and ranging from 3 to 9 participants. • Adjacency pair annotation:[Dhillon et al. 04] • All 75 meetings labeled with dialog acts and adjacency pairs. • Agreement/disagreement annotation:[Hillard et al. 03] • Annotation of 4 meeting segments plus tags for 4 additional meetings obtained with a clustering method [Hillard et al. 03] • 8135 labeled utterances: 11.9% agreements 6.8% disagreements 23.2% backchannels 58.1% other • Inter-labeler reliability: kappa coefficient of .63

Step 1: Addressee (AP) identification • Baseline algorithm: • always assume that the addressee in an adjacency pair (A,B) is the party who spoke last before B. • Works reasonably well: 79.8% accuracy. • Our method: speaker ranking • rank all speakers S = (s1,…,sN) with probabilities reflecting how likely they are to be speaker A (i.e. the addressee). • Log-linear (maximum entropy) probability model for ranking: • di inD = (d1,…,dN) are observations pertaining to speaker siand to the last utterance of speakersi

Features for AP identification • Structural: • number of speakers taking the floor between A and B we match the baseline with this single feature (79.8%) • Durational features: • duration of A short utterances generally do not elicit responses/reactions • seconds of overlap with any other speaker competitive speech incompatible with AP construction • Lexical features: • number of n-grams both in A and B (uni- to trigrams) A and B parts often have some words in common • first word of A to exploit cue words, detect wh- questions • Is the B speaker (addressee) named explicitly in A?

Adjacency pairs identification: results • Experimental setting: 40 meetings used for training (9104 APs), 10 meetings used for testing (1723 APs) 5 meetings of an held-out set used for forward feature selection and regularization (Gaussian smoothing)

Step 2: Agreement/disagreement classification:local features of the utterance • Local features of the utterance include the ones used in [Hillard et al. 03] (but no acoustics). Best predictors: Lexical features: • agreement and disagreement markers [Cohen, 02], adjectives with positive/negative polarity [Hatzivassiloglou and McKeown, 97], general cue phrases [Hirschberg and Litman, 94]. • first word of the utterance • score according to four LM (one for each class). Structural and durational features: • duration of the utterance • speech rate

Label dependencies in sequence classification • Previous-tag feature p(ci|ci-1) helpful in many NLP application to model context: POS tagging, supertagging, dialog act classification. • Various families of Markov models to train (e.g. HMMs, CMMs, CRFs). • Limitations of fixed-order Markov models for representing of multi-party conversations: • overlapping speech; no strict label ordering • multiple speakers, with different opinions: previous tag (speaker A) might affect current tag (speaker B addressing A), but unlikely if B addresses C.

Label dependency: previous-tag • Intuition: previous tag affects current tag A speaking to B tag index (BACKCHANNEL tags ignored for better interpretability)

Label dependency: same-interactants previous tags • Intuition: If A disagreed with B(when A last spoke to B), then A is likely to disagree with B again.

Label dependency: symmetry • Intuition: If B disagreed with A(when B last spoke to A), then A is likely to disagree with B.

Label dependency: transitivity • Intuition: If Adisagrees with C after C agreed with B, then we might expect A to disagree with B as well.

Parameter estimation • We use (dynamic) Bayes nets to factor the conditional probability distribution: C = (c1,…,cL) : sequence of labels D = (d1,…,dL) : sequence of observations pa(ci) : parents of ci, i.e. label dependencies as in: • (Maximum entropy) log-linear model used to estimate the probability of the dynamic variable ci:

Decoding of the maximizing sequence • Beam search • Maintain a beam of B most likely left-to-right partial sequences (as in [Ratnaparkhi 96] for POS tagging). • In theory, possible search errors. • Practically, our search is seldom affected by beam size if isn’t too small: B=100 is a reasonable value for any sequence.

Results: comparison to previous work • 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized. • Best performing feature set represents a 27.3% error reduction over [Hillard et al, 03].

Results: comparison to previous work • 3-way classification (AGREE, DISAGREE, OTHER) as in [Hillard et al, 03]; priors are normalized. • Label dependency features reduce error by 9%.

Results: 4-way classification • 6-fold cross-validation, each fold on one meeting, representing a total of 8135 utterances to classify. • Label dependencies contribution on different feature sets:

Results: 4-way classification • Accuracies by label dependency type (assuming all other features – structural, durational, lexical - are used):

Conclusion and future work • Conclusion: • Performed addressee identification as a byproduct of agreement/disagreement classification. • AP identification: significantly outperform a competitive baseline. • Compelling evidence that models that incorporate label dependency features are superior. • Future work: • Summarization: identification of what propositional content was agreed or disagreed. • Addressee identification may also be beneficial in DA labeling of multi-party speech.

Thank you

Preceding-tags dependencies

Preceding-tag dependency: transitivity

Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg

Michel Galley, Kathleen McKeown, Julia Hirschberg Columbia University Elizabeth Shriberg

Presentation Transcript

Columbia University

Gourmet Galley

Agust ín Gravano 1,2 Julia Hirschberg 1

Preparing for Promotion: The Tenure Process Julia Hirschberg, Columbia University

Columbia University

Kathleen McKeown Department of Computer Science Columbia University

Nick McKeown

May 21, 2005 David Schiminovich Michel Zamojski Columbia University M. Rich UCLA +

Vasileios Hatzivassiloglou, Kathleen R. McKeown Columbia University

CS 6998 Computational Approach to Emotional Speech Instructor: Prof. Julia Hirschberg

Columbia University

Columbia university

Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

Mary McKeown

Sunny Ng, Columbia University Yifang Yang, Columbia University

J áchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4

Julia D. Grant and Kathleen K. Bucholz

Columbia University

Agust ín Gravano 1,2 Julia Hirschberg 1

Columbia University