1 / 17

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue. SIGDial 2004 Gina-Anne Levow April 30, 2004. Roadmap. Motivation Data Collection Segment Boundary Selection Feature Extraction & Analysis Cues to Segment Boundaries Preliminary Classification Study Conclusion.

etta
Download Presentation

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prosodic Cues to Discourse Segment Boundariesin Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004

  2. Roadmap • Motivation • Data Collection • Segment Boundary Selection • Feature Extraction & Analysis • Cues to Segment Boundaries • Preliminary Classification Study • Conclusion

  3. Why Segment? • Enables language understanding tasks • Reference resolution • Anaphors typically refer to entities in current segment • Summarization • Identify and represent range of topics • Conversational understanding • Constrain recognition • Different interpretations in different contexts

  4. Approaches to Segmentation • Monologue • Text similarity: • Vector space, language model, cue phrases • (Hearst 1994, Beeferman et al 1999, Marcu 2000) • Prosodic cues: (with text) • Pitch, amplitude, duration, pause • (Nakatani et al 1995, Swerts 1997, Tur et al 2001) • Human dialogue • Dialogue act classification (Shriberg et al, 1998; Taylor et al, 1998) • Text: language models; Prosody: contour, accent type • Multi-party segmentation • Text + silence (Galley et al, 2003)

  5. Prosody in Human-Computer Dialogue • Errors in speech recognition • Prosody provides additional source of evidence • Topic change can be expensive • Possible contrasts to human-human dialogue • More stilted speaking style • Slow conversation

  6. Data Collection • System: • SpeechActs (Sun Microsystems, 1993-1996) • Voice-only interface to desktop applications • Email, calendar, weather, stock quotes, time, currency • Data: • 60 hours, collected during field trial • 19 subjects: 4 expert, 14 novice, guest • Recorded : 8KHz, 8-bit ulaw, Logged • Manually transcribed • > 7500 user utterances

  7. Discourse Segment Boundary Data • Focus: • High-level discourse segment boundaries • Not fine-grained subtopic analysis (future work) • More reliably coded and extracted • (Swerts, 1997; Nakatani et al 1995) • Task-based correspondence • Align with changes from application to application • Reliably extractable from current data set

  8. Data Set • Paired data set: • Discourse segment-final and segment-initial pairs • User utterances • Last command in current application, and application change command • U: What’s the price for Sun? Segment-final • S: … • U: Switch to mail. Segment-initial • 473 pairs • Extracted automatically • Alignment, content verified manually

  9. Acoustic Analysis • Features: • Pitch and intensity • Extracted automatically (Praat) • 5-point median smoothed • Normalized per-speaker/call • Scalar measures: • Maximum, minimum, mean • Full utterance

  10. Acoustic Contrasts • Pitch: • Segment initial vs segment-final: • Maximum, minimum, and mean significantly higher • Lower final fall in segment-final • Intensity: • Segment-initial vs segment-final • Mean intensity significantly higher • No other measures significant

  11. Acoustic Contrasts

  12. Discussion • Segment initial utterances • Significantly higher in pitch and intensity • Largest contrast • Dramatically lower pitch in segment final • Low pitch as cue to topic finality • Robust cues to discourse segment boundaries

  13. Classification: Preliminary Experiments • Automatic prosody-based identification of segment boundaries • Question: Does a pair of utterances span a segment boundary? • Data: • Ordered utterance pairs: • Half segment-final + segment-initial • Half non-boundary

  14. Classifier and Features • Decision tree classifier (c4.5) • Features: • Pitch and intensity • Maximum, minimum, mean • Values for each utterance • Differences across pair • Preliminary classification results: • 70-80% accuracy • Key features: • Minimum pitch, average intensity

  15. Classifier Tree Min

  16. Conclusions & Future Work • Discourse segment boundaries in HCI • Segment-initial utterances • Significant increases in pitch and intensity • Relative to segment final • Robust contrastive use of pitch and intensity • Preliminary classification efforts: 70-80% • Difference in pitch minimum, intensity • Extend to subdialogue structure • Richer feature set, data set

  17. Conclusions & Future Work • Discourse segment boundaries in HCI • Segment-initial utterances • Significant increases in pitch and intensity • Relative to segment final • Robust contrastive use of pitch and intensity • Extend to subdialogue structure • Richer feature set, data set

More Related