180 likes | 336 Views
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue. SIGDial 2004 Gina-Anne Levow April 30, 2004. Roadmap. Motivation Data Collection Segment Boundary Selection Feature Extraction & Analysis Cues to Segment Boundaries Preliminary Classification Study Conclusion.
E N D
Prosodic Cues to Discourse Segment Boundariesin Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004
Roadmap • Motivation • Data Collection • Segment Boundary Selection • Feature Extraction & Analysis • Cues to Segment Boundaries • Preliminary Classification Study • Conclusion
Why Segment? • Enables language understanding tasks • Reference resolution • Anaphors typically refer to entities in current segment • Summarization • Identify and represent range of topics • Conversational understanding • Constrain recognition • Different interpretations in different contexts
Approaches to Segmentation • Monologue • Text similarity: • Vector space, language model, cue phrases • (Hearst 1994, Beeferman et al 1999, Marcu 2000) • Prosodic cues: (with text) • Pitch, amplitude, duration, pause • (Nakatani et al 1995, Swerts 1997, Tur et al 2001) • Human dialogue • Dialogue act classification (Shriberg et al, 1998; Taylor et al, 1998) • Text: language models; Prosody: contour, accent type • Multi-party segmentation • Text + silence (Galley et al, 2003)
Prosody in Human-Computer Dialogue • Errors in speech recognition • Prosody provides additional source of evidence • Topic change can be expensive • Possible contrasts to human-human dialogue • More stilted speaking style • Slow conversation
Data Collection • System: • SpeechActs (Sun Microsystems, 1993-1996) • Voice-only interface to desktop applications • Email, calendar, weather, stock quotes, time, currency • Data: • 60 hours, collected during field trial • 19 subjects: 4 expert, 14 novice, guest • Recorded : 8KHz, 8-bit ulaw, Logged • Manually transcribed • > 7500 user utterances
Discourse Segment Boundary Data • Focus: • High-level discourse segment boundaries • Not fine-grained subtopic analysis (future work) • More reliably coded and extracted • (Swerts, 1997; Nakatani et al 1995) • Task-based correspondence • Align with changes from application to application • Reliably extractable from current data set
Data Set • Paired data set: • Discourse segment-final and segment-initial pairs • User utterances • Last command in current application, and application change command • U: What’s the price for Sun? Segment-final • S: … • U: Switch to mail. Segment-initial • 473 pairs • Extracted automatically • Alignment, content verified manually
Acoustic Analysis • Features: • Pitch and intensity • Extracted automatically (Praat) • 5-point median smoothed • Normalized per-speaker/call • Scalar measures: • Maximum, minimum, mean • Full utterance
Acoustic Contrasts • Pitch: • Segment initial vs segment-final: • Maximum, minimum, and mean significantly higher • Lower final fall in segment-final • Intensity: • Segment-initial vs segment-final • Mean intensity significantly higher • No other measures significant
Discussion • Segment initial utterances • Significantly higher in pitch and intensity • Largest contrast • Dramatically lower pitch in segment final • Low pitch as cue to topic finality • Robust cues to discourse segment boundaries
Classification: Preliminary Experiments • Automatic prosody-based identification of segment boundaries • Question: Does a pair of utterances span a segment boundary? • Data: • Ordered utterance pairs: • Half segment-final + segment-initial • Half non-boundary
Classifier and Features • Decision tree classifier (c4.5) • Features: • Pitch and intensity • Maximum, minimum, mean • Values for each utterance • Differences across pair • Preliminary classification results: • 70-80% accuracy • Key features: • Minimum pitch, average intensity
Classifier Tree Min
Conclusions & Future Work • Discourse segment boundaries in HCI • Segment-initial utterances • Significant increases in pitch and intensity • Relative to segment final • Robust contrastive use of pitch and intensity • Preliminary classification efforts: 70-80% • Difference in pitch minimum, intensity • Extend to subdialogue structure • Richer feature set, data set
Conclusions & Future Work • Discourse segment boundaries in HCI • Segment-initial utterances • Significant increases in pitch and intensity • Relative to segment final • Robust contrastive use of pitch and intensity • Extend to subdialogue structure • Richer feature set, data set