330 likes | 512 Views
A Question of Questions: Prosodic Cues to Question Form and Function. Julia Hirschberg (Joint work with) Jennifer Venditti and Jackson Liscombe. Questioning in Dialogue. A fundamental activity in conversation Elicit information Elicit action But How to define a question?
E N D
A Question of Questions: Prosodic Cues to Question Form and Function Julia Hirschberg (Joint work with) Jennifer Venditti and Jackson Liscombe
Questioning in Dialogue • A fundamental activity in conversation • Elicit information • Elicit action • But • How to define a question? • Bolinger ’57: “fundamentally an attitude…an utterance that ‘craves’ a verbal or other semiotic … response” • Ginzburg & Sag ‘00: “the semantic object associated with the attitude of wondering and the speech act of questioning” • How to identify a question as such • How to represent its semantics? The intention of the questioner?
Distinguishing Question Form and Function • Questions may take many syntactic forms • Is it a question? What is a question? It’s a question, isn’t it? Is it a question or an answer? Right? It’s a question? • Questions may serve many pragmatic functions • Clarification-seeking? Information-seeking? Confirmation-seeking? • Possible Indicators • Syntactic cues • Context • Intonation
Questions in Spoken Dialogue Systems • Goals • Examine question form and function • How are they related? • What features characterize them? • Identify form and function automatically in an Intelligent Tutoring domain
Previous Studies • Integration of prosodic tree model with language model based on words yields best performance accuracy in detecting questions/question form(Shriberg et al.’98: English) • Some corpus-based (MapTask) studies have examined tune/accent types wrt. question function(Kowtko’96: Glaswegian English; Grice et al.’95: German, Italian, Bulgarian) • Studies of different types (functions) of clarification questions(Rodríguez & Schlangen’94: German; Edlund et al.’95: Swedish) • Our goal: a comprehensive quantitative analysis of question form and function in English which will permit question form/function identification
Domain: Intelligent Tutoring Systems • ITSs must be able to recognize both the form and function of student questions • Students ask human tutors many questions • More questions better learning • Different question FORMs seek different information • e.g. polar questions seek yes-no answer • wh-questions seek different information • Different question FUNCTIONs also often require different types of answers
Wh-questions, e.g. • Information-seeking: (S has just submitted an essay to the tutor) S: Ok, what do you think about that? T: Uh, well that uh you have uh there are too many parameters here which uh need definition ... • Clarification-seeking: • T: So if there is if the only force on an object in earth’s gravity then what is its motion called? • S: What was the motion called? • T: Yes, what’s the name for this motion?
Yes-no questions, e.g. • Information-seeking tutor provides additional information • Clarification clarification subdialogue • Successful ITSs must be able to recognize the presence of a question in a student turn and its form and function
Question Corpus • Human-human tutoring dialogs collected by Litman et al.’04 for development of ITSpoke, a speech-enabled ITS designed to teach physics • Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U. Memphis)) • Corpus includes 1030 student questions • ‘Question’ defined a la Bolinger ‘57 as “an utterance that craves a response” • 25.2 Qs/hour • 13.3% of total student speaking time • This study: a subset of 643 tokens
Question Detection what symbol are you talking about do i have to rewrite this again am i ok with that so it’d be one meter per second squared
Coding question type • Form coding based on surface syntax • Declarative question (dQ): It’s a vector? A vector? • Yes-no question (ynQ): Is it a vector? • Wh-question (whQ): What is a vector? • Tag question (ynTAG): It’s a vector, isn’t it? • Alternative question (altQ): Is it a vector or a scalar? • Particle (part): Huh? • Function coding derived from Stenström ‘84 • Confirmation-seeking check question (chk) • Clarification-seeking question (clar) • Information-seeking question (info) • Other (oth)
F0 measures of non-falling questions • Quantitative analysis of F0 height in the 573 non-falling tokens w/sufficient data for analysis • Examined question nucleus (nucF0) and tail (btF0) only • Speaker-normalized (z-score) F0 of: • 1. nuclear accent (nucF0) • 2. rightmost edge of question (btF0) • 3. difference between 1 & 2 (riserange)
Question Form and F0 • DeclQs and YNQs both thought to rise (H*H-H% vs. L*H-H%?): Are there F0 height differences between them? • 2-way ANOVA on form x function: FORM: nucF0: F(5)=19.34, p=0 btF0: F(5)10.71, p=0 riserange: F(5)=3.6, p<.01 • Planned comparisons (Tukey, alpha=.01) show no difference between declarative Qs and yes-no Qs • Main effect of form caused by yes-no tags (low F0) and particles (high F0)
chk clar info chk clar info Normalized means at nucF0 and btF0
Question Function and F0 • Question dialog acts thought to correlate with F0:Does question FUNCTION affect F0? • 2-way ANOVA on form x function: FUNCTION: nucF0: F(3)=16.6, p=0 btF0: F(3)=8.56, p<.001 riserange: F(3)=3.94, p<.01 • Main effect; planned comparisons show: • clarQ > chkQ (nucF0 & btF0) • infoQ > clarQ/chkQ (nucF0) • No interactions for any measure
Clarification types and F0 Clark ‘96 levels of coordination: sources of communication problems
Effects of Clarification Type • One-way ANOVA combining levels 1&2 into single acoustic/perceptual category: nucF0: F(3)=5.41, p=.001 btF0: F(3)=6.6, p<.001 riserange: F(3)=2.59, p=.05 • Main effect for clarification type • Ranking for each measure: higher F0 > > > > > > > > > > > > > > > lower F0 acoust/percept > understanding > NIR > intention • Planned comparisons (Tukey, alpha=.01) show only significant comparison was acoust/percep > intention
Can Prosody Distinguish Question Form? Question Function? • Only a few question forms prosodically distinct in our study – lexico/syntactic information can help • Question function more successfully differentiated prosodically – where there is less reliable lexico/syntactic information • Can we use prosodic information with lexico-syntactic information to help identify question form and function automatically?
Detecting Student Questions • Syntax • Wh-words, subject/auxiliary inversion • Prosody • Phrase-final rising intonation (Pierrehumbert & Hirschberg ‘90) • Duration and pausing (Shriberg et al. ‘98) • Lexico-pragmatics • personal pronouns, utterance-initial pronouns (Geluykens 1987; Beun 1990)
Corpus • 141 ITSpoke dialogues • 5 hours of student speech • Student turns average 2.5 seconds • 1,030 questions • 25 questions per hour • 70% of turns consist entirely of the question • 89% of questions are turn-final
Question-Bearing Turns • Contain one or more questions • N = 918
Features Extracted • Prosodic • pitch • loudness • pausing • speaking rate • calculated over entire turn and last 200 ms • Syntactic • unigram and bigram part-of-speech tags
Feature Extraction • Lexical • unigram and bigram hand-labeled transcriptions • Student and task dependent • pre-test score • gender • correctness • previous tutor dialogue act
Machine Learning Experiments • Question-bearing vs. non-question-bearing • Down-sampled to 50/50 distribution • Experimented by feature type • Adaboosted C4.5 decision trees • 5-fold cross validation • Best results with all features • Accuracy = 79.7% • Precision = Recall = F-measure = 0.8
Feature Type Discussion • Which features most informative? • pitch slope of last 200 ms and entire turn • maximum and mean pitch of turn • Which features most often used in learning? • pre-test score • slope of last 200 ms • maximum pitch of entire turn • cumulative pause duration
Other Observations • Syntactic features were informative • personal pronoun + verb, wh-pronoun, interjection • Lexical features were informative • yes, right, what, I, you
Conclusions • Most questions in our tutoring corpus are declarative in form • More than syntax is needed to identify these as questions • Prosodic features are very important • Detecting question-bearing turns is possible • Detecting question function is needed