180 likes | 527 Views
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation. Gina-Anne Levow University of Chicago October 14, 2005. Roadmap. Motivation Enabling fluent conversation Data Collection and Processing Acoustic Analysis of Turn-taking Tone and Intonation
E N D
Turn-taking in Mandarin Dialogue:Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005
Roadmap • Motivation • Enabling fluent conversation • Data Collection and Processing • Acoustic Analysis of Turn-taking • Tone and Intonation • Recognizing Boundaries and Interruptions • Conclusions and Future Work
Turn-taking in Dialogue • Goal: Enable fluent conversation • Turn-taking is collaborative (Duncan 1974) • Requires producing and understanding cues • Crucial for dialogue agents and understanding • End-pointing in spoken dialogue systems • Confusion of barge-in and backchannel
Challenges • Silence not sufficient or necessary • Dialogue involves overlap • Overlaps are not arbitrary (Ward et al, 2000) • Proposed cues: • Multimodal: Gesture, Gaze • Not always available • Prosodic • Attested in English, Japanese • Tone languages?
Approach • Identify significant differences in • Pitch, intensity between initial/final positions • Intensity for different transition types • Pitch, intensity of interruptions vs smooth • Assess interaction of tone and intonation • Exploit contrasts for recognition of • Turn unit boundaries: ~93% • Interruptions: 62%
Data Collection • Taiwanese Putonghua Corpus • 5 spontaneous dialogues • ~20 minutes each • 7 female, 3 male speakers • Manually transcribed and word segmented • Turn beginnings and overlaps • Manually labelled and time-stamped
Data Processing • Automatic forced alignment • CU Sonic (Pellom et al) language porting • Dictionary-based, manual pinyin-ARPABET mapping • Yields phone, syllable, word, silence duration, position • Acoustic analysis • Pitch, Intensity: Praat (Boersma, 2001) • Per-side log-scaled z-score normalized
Turn Unit Boundary Contrasts • Unit initial versus final syllables • Pitch significantly lower in final than initial • Intensity significantly lower in final than initial • Across all transition types • Rough versus smooth transitions • Final syllables • Intensity significantly higher
Characterizing Interruptions • Contrast first syllable of “inter” vs “smooth” • Pitch significantly higher in interruptions • Intensity significantly higher in interruptions
Interactions of Tone and Intonation • Clear intonational cues in tone language • What affect on tones? • Contrast tones in final vs non-final position • Mean pitch lowered in each tone • Relative height largely preserved • Contour lowered but largely preserved • Distinguishing tone characteristics retained
Interactions of Tone and Intonation • Mean pitch across tones • Tone contour changes
Recognizing Turn Unit Boundaries and Turn Types • Classifier – Boostexter (Schapire 2000); 10-fold xval • Comparable results for C4.5, SVMs • Prosodic features: • Local: • Pitch, Intensity: Mean, Max; Duration • Word, syllable • Contextual: • Difference b/t current and following word: pitch, int • Silence • Text features: • N-grams within preceding, following 5 syllables
Recognizing Turn Unit Boundaries • Word: Boundary/non-boundary • 3200 instances; down-sampled, balanced set • Key features: Silence, max intensity • Lexical features: preceding ‘ta’, following ‘dui’ • Prosodic features more robust without silence
Recognizing Interruptions • Initial words: Interruption/smooth start • >400 instances: downsampled, balanced set • Contextual features: • Difference of current word pitch, intensity w/ prev • Preceding silence • Best results: 62%, all feature sets • Key feature: silence • Without silence drops to chance
Discussion • Turn-taking in Mandarin Dialogue • Significant intonational, prosodic cues • Initiation/Finality: Lower final pitch, intensity • Turn transition types: • Rough vs smooth: higher final intensity • Interruptions vs smooth: higher pitch, intensity • Tones globally lowered; shape, relative height • Exploit cues for boundary, interruption • 93%, 62% respectively – with silence
Conclusions & Future Work • Intonational cues to turn-taking in Mandarin • Pitch jointly encodes lexical, dialogue meaning • Basic tone contrasts largely preserved • Prosodic information supports dialogue flow • Silence important, but other cues co-signal • Integrate dialogue information for tone reco • Turn-taking, topic structure, etc