260 likes | 286 Views
Recognizing Structure: Dialogue Acts and Segmentation. Julia Hirschberg CS 6998. Today. Recognizing structural information from speech Topic structure Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track)
E N D
Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998
Today • Recognizing structural information from speech • Topic structure • Speech/dialogue acts • Applications • Speech browsing and search of large corpora • Broadcast News (NIST TREC SDR track) • Topic Detection and Tracking (NIST/DARPA TDT) • Customer care call recordings, focus groups, voicemail
Discourse Structure and Topic Structure • Intention-based accounts • Grosz & Sidner ‘86 • Conversational moves (games) • Edinburgh map task dialogues • Adjacency pairs • Schegloff, Sacks, Jefferson
Indicators of Topic Structure • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) • In speech?
Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92
Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 • Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions
Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity
Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News
Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration
Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn
Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly
Speech Act Theory • John Searle • Locutionary acts: semantic meaning • Illocutionary acts: ask, promise, answer, threat • Perlocutionary acts: Effect intended to be produced on speaker: regret, fear • Dialogue acts • Many tagging schemes (e.g. DAMSL)
Practical Motivations: Spoken Dialogue Systems • Add more information about speaker intentions • Disambiguate ambiguous utterances • Okay • Um • Right
Experimental Evidence: Nickerson & Chu-Carroll ‘99 • Can/would/would..willing questions • Can you move the piano? • Would you move the piano? • Would you be willing to move the piano? • A la Sag & Liberman ‘75: can intonation disambiguate?
Experiments • Production studies: • Subjects read ambiguous questions in disambiguating contexts • Control for given/new and contrastiveness • Polite/neutral/impolite • Problems: • Cells imbalanced • No pretesting
No distractors • Same speaker reads both contexts
Results • Indirect requests • If L%, more likely (73%) to be indirect • 46% H%: differences in height of boundary tone? • Politeness: can differs in impolite (higher rise) vs. neutral • Variation in speaker strategy
Corpus Studies: Jurafsky et al ‘98 • Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… • Continuers: Mhmm (not taking floor) • Assessments: Mhmm (tasty) • Agreements: Mhmm (I agree) • Yes answers: Mhmm (That’s right) • Incipient speakership: Mhmm (taking floor)
Corpus Study • Switchboard telephone conversation corpus • Hand segmented and labeled with DA information (initially from text) • Relabeled for this study • Analyzed for • Lexical realization • F0 and rms features • Syntactic patterns
Results: Lexical Differences • Agreements • yeah (36%), right (11%),... • Continuer • uhuh (45%), yeah (27%),… • Incipient speaker • yeah (59%), uhuh (17%), right (7%),… • Yes-answer • yeah (56%), yes (17%), uhuh (14%),...
Results: Prosodic and Syntactic Cues • Relabeling from speech produces only 2% changed labels over all (114/5757) • 43/987 continuers --> agreements • Why? • Shorter duration, lower F0, lower energy, longer preceding pause • Over all DA’s, duration best differentiator but… • Highly correlated with length in words • Assessments: That’s X (good, great, fine,…)
Future Work • Speaker differences? • Higher level prosodic differences among ambiguous word DA’s?
Next Week • Turn-taking and disfluencies