1 / 37

Local and Global Adaptation in Hyperarticulation

Local and Global Adaptation in Hyperarticulation. Amanda Stent, Susan Brennan, Marie Huffman. Outline. Introduction: Adaptation User adaptation: Hyperarticulation Current and future work. Adaptation in Spoken Dialog.

jenn
Download Presentation

Local and Global Adaptation in Hyperarticulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local and Global Adaptation in Hyperarticulation Amanda Stent, Susan Brennan, Marie Huffman

  2. Outline • Introduction: Adaptation • User adaptation: Hyperarticulation • Current and future work

  3. Adaptation in Spoken Dialog • There is considerable variation in spoken dialog. Much of this variation is designed to be adaptive. • Speakers may converge (e.g. Brennan and Clark 96) or complement each other (e.g. Oviatt 95, Brennan 90). • Adaptation may be partner-specific or generic (Brown and Dell 87). • Adaptation may be local or global.

  4. Adaptation in Spoken Dialog • Interesting questions include: • How do humans adapt to each other in spoken dialog? • Speaking style, e.g. dialect, speaking rate • Lexical and syntactic choices • Initiative • Are these adaptations partner-specific or generic, local or global? • How does adaptation in human-computer dialog differ from adaptation in human-human dialog? • Can we use adaptation in human-computer dialog to improve dialog outcomes?

  5. Adaptation: The User • Humans adapt to their dialog partners, including computers, at many levels: • Phonetic (e.g. hyperarticulation) • Lexical/syntactic (e.g. producing simpler utterances, rephrasing, mirroring system’s choice of words) • Dialog and task (e.g. skipping acknowledgments, following system initiative) • Some of these adaptations reflect incorrect models of the conversational partner, and/or are known to be maladaptive (e.g. hyperarticulation, some rephrasing).

  6. Adaptation: The System • Systems can adapt to make the user feel more comfortable or to mimic human adaptations (responsive generation). • Converging on the user’s choice of referring expression. • Following the user’s topic shifts. • Systems can construct interactions that guide the user to useful forms of adaptation (directive generation). • Using words that can be recognized/parsed. • Suggesting rephrases on misrecognition. • Presenting their capabilities accurately.

  7. Outline • Introduction: Adaptation • User adaptation: Hyperarticulation • Current and future work

  8. Experiment: • The problem • Hypotheses • Experiment design • Experiment results • Discussion

  9. The Problem • When users experience speech recognition errors, they try to adapt in ways that do not lead to performance improvements • Hyperarticulation (Soltau and Waibel 98, Wade et al. 92) • Rephrasing to out of grammar (Fischer 99, Choularton and Dale 04) • Our questions: • Considered as a form of adaptation, how exactly does hyperarticulation function?

  10. Hypotheses • In repairs of misrecognitions, subjects will exhibit hyperarticulation. • Slower speaking rate, longer pauses, more careful speech (Oviatt et al. 98; Levow 98, 99; Hirschberg et al. 99, 00) • (Local impact) Hyperarticulation will be more likely to appear around the actual misrecognition than elsewhere in the utterance. • (Global impact) Once users start hyperarticulating, this behaviour will persist even if errors stopped occurring.

  11. Experiment Design • Wizard-of-Oz procedure • Subjects answered prerecorded questions about a children’s softball team database. • Subjects were told to answer in complete sentences and to repeat until heard correctly. • System feedback was provided in text. • Usually “I heard you say …” • For unplanned errors by subjects (e.g. disfluencies, use of pronouns or ellipsis, incomplete utterances), other feedback was provided. • For selected planned error utterances, system feedback contained misrecognitions.

  12. Unplanned error Example: Unplanned Error • Q. What is Ryan Dade bringing to the food sale? • U. Ryan Dade is bringing cat collars, and a basket, and pet toys to the foo, to the garage sale, oops • S. Please repeat • U. Ryan Dade is bringing cat collars, a basket, and pet toys to the garage sale Repair

  13. Planned error Repair Example: Planned Error • Q. What is Kate Tolstoy bringing to the food sale? • U. Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale • S. You said: Kate Tolstoy is bringing some cooking label in a pickle to the food sale • U. Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale • S. You said: Kate Tolstoy is bringing some cookie dough and a picnic table to the food sale

  14. Measurements • Speaking rate (syllables/sec.) • Average pause length (ms.) • Phonetic features indicating careful speech: • mid-word /t/ tapping vs. flap /D/ • e.g. Peter, tutor, party, forty, writer • Word-final /t/ release vs. non-release • e.g. Kate, scientist, peat, flute, dart • /t/ release after /n/ vs. non-release • e.g. Kanter, scientist, Planters, dentist, Santa • Tense a in indefinite articles • /d/ in and

  15. Measurements • Local impact of hyperarticulation • Target phonemes were coded for all planned and unplanned errors and all repairs • Global impact of hyperarticulation • Planned errors were placed so that very few errors occurred in the 1st third of each dialog, errors occurred every 1-3 utterances in the 2nd third, and a run of 5 errors occurred in the last third • Impact of hyperarticulation on SR: • All utterances were run through two speech recognizers, one grammar-based and one statistical

  16. Experiment • 16 subjects (9 women, 7 men, mean age 22 years) participated in the experiment • All native speakers of English • 10 monolingual, 6 bilingual but English-dominant • Each answered 66 questions • 2 additional subjects’ data were discarded due to equipment failure • Some utterances were discarded due to major disfluencies or being cut off • Result: 1202 utterances -- 373 planned errors and repairs

  17. Data Coding • Utterance length, number of words, number of syllables were computed automatically • PRAAT was used to measure number and length of utterance-internal pauses greater than 10 ms. in length • Phonetic annotation of target words in errors and repairs was done by hand

  18. Measures of Hyperarticulation: Speaking Rate • Speaking rate and clear speech are reliably correlated (r = -.239, p < .001). • Speakers spoke more slowly in a repair than in a planned error, 3.62 syl./sec to 4.12 syl./sec. (p =< .001). • For all paired utterances taken together, repairs were slower than errors, 3.67 to 4.17 syl./sec. (p < .001)

  19. Measures of Hyperarticulation: Careful Speech • On average, speakers produced more clear forms in repairs than in errors, 38% to 30% (p < .001). • Of the 5 phonetic features coded for the paired utterances: • 3 were more likely to be pronounced in their clear forms in the repair than in the error: /t/ tapping vs. flap /D/, word-final /t/ release vs. non-release, /t/ release vs. non-release after /n/. • and 2 were not: tense a in indefinite articles, and /d/ in and.

  20. Measures of Hyperarticulation: Careful Speech • Content words were produced in clear form 13% more often in a repair than in an error (p = .002). • Function words were produced in clear form only 4% more often in a repair than in an error (p = .002).

  21. Local Impact of Hyperarticulation • Do speakers hyperarticulate as a precise form of correction aimed at repairing the most troublesome part of the utterance? • The percentage of clear forms increased 12% for the misunderstood portion during the repair, significantly greater than the before and after portions (only 4.3% and 4.7%, respectively).

  22. Global Impact of Hyperarticulation • Is hyperarticulation a “switch” or a “dial”? • The closer an utterance was to the most recent previous error, the more carefully it was produced (speaking rate, clear forms) (p < .005). • Speakers gradually return to relaxed speech about 4-7 utterances after seeing evidence of misrecognition.

  23. Individual Differences • Individual speakers displayed substantial variability in average speaking rate (2.43—5.27 syl./sec). • BUT All speakers slowed their speaking rate during repairs, relative to before repairs (.04 syl./sec -- 1.33 syl./sec).

  24. Individual Differences • All but 3 speakers produced more clear speech during repairs than before repairs. • Speaking rate and careful speech were correlated across speakers; that is, those who spoke rapidly tended to produce more relaxed forms and those who spoke slowly tended to produce more clear forms.

  25. Individual Differences • A few speakers adopted a hyperarticulate style of speaking throughout the experiment; those who experienced the most unplanned errors spoke the slowest during non-repairs. • Both monolingual and bilingual speakers slowed their speaking rate equally during repairs (and there was no difference in average speaking rates of monolinguals versus bilinguals). However, monolinguals increased their proportion of clear speech marginally more than did bilinguals.

  26. Impact on Speech Recognition • For the statistical speech recognizer, higher word error rates were associated with slower speech (p < .001) but not with more careful speech. • For the grammar-based recognizer, higher word error rates were correlated with faster speech (p < .001), and with more careful speech (p = .05). • For both recognizers, the effect sizes (by Cohen’s 88 standards) are rather small.

  27. Impact on Speech Recognition • As (Wade et al. 92) found, not all aspects of hyperarticulation cause problems, and any effects depend a great deal on how the acoustic model was trained. • Misrecognition errors may cause more problems due to users’ rephrasing than to users’ switching to hyperarticulate speech.

  28. Discussion • Hyperarticulation varies both by location within the utterance and over time. • The type and degree of hyperarticulation depend somewhat on the individual speaker. • Once hyperarticulation has been detected, the system can try to guide the user away from hyperarticulation by modifying its behaviours (Hockey et al. 03). • However, hyperarticulation is not as maladaptive as rephrasing to out-of-grammar.

  29. Outline • Introduction: Adaptation • User adaptation: Hyperarticulation • Current and future work

  30. Models of System (Weaver et al.) • The problem • Experiment design • Preliminary results

  31. The Problem • Users may develop inaccurate models of dialog systems, leading to maladaptive interactions. • Our question: • How can we construct system behaviors that reduce user maladaptation?

  32. Experiment Design • Same as experiment 1, except: • Questions and system feedback provided using TTS. • Planned errors appear throughout dialog -- each phonetic category is represented in each quarter of the dialog, and in each location (before, during and after error). • Subjects assigned to one of two conditions: • (Graceful) System model is one of a system that understands human language. • (Nongraceful) System model is one of a system that recognizes but does not understand speech.

  33. Experiment Design • System model is presented to subjects in experiment setup, through choice of TTS voice, and through construction of planned errors. For example: • (True) Hunter Mariano plays #center# • (Graceful) Hunter Mariano plays #better# • Semantically and syntactically meaningful • (Nongraceful) Hunter Mariano plays #venture# • Phonetically similar, syntactically nonsensical

  34. Preliminary Results • Subjects hyperarticulate in repairs regardless of condition; however, there is a trend to clearer speech in the nongraceful condition before errors. • Subjects in the graceful condition use less clear speech initially (26%, increasing to 44% on repairs). Their speaking rate slows down an average of .25 syl./sec on repairs. • Subjects in the nongraceful condition use more clear speech initially (38%, increasing to 49% on repairs). Their speaking rate slows down an average of .52 syl./sec on repairs.

  35. System Adaptation (Marge, Gerrig, Stent et al.) • Experiment design: • Subjects interact with a spoken dialog system to fill out a survey. • Two variables: intiative and lexical choice. • Initiative: • System chooses topics and their order (directive) • System chooses topics, user chooses order (mixed) • User chooses topics and their order (nondirective) • Lexical choice: • System does not adapt to user’s choice of topic labels, choice of tense (directive) • System does adapt to user’s choice of topic labels, choice of tense (adaptive)

  36. Directive Generation • Measures: • Initiative: • Topic choice, order • Requests for help, prompt repetition • Length of user responses • Number of hangups • Match between system’s and user’s estimate of user’s overall opinion of course • Lexical choice: • Number of misrecognitions • Pause length between prompt and response

  37. Conclusions • Variation in human-human dialog is omnipresent. • Much of it is purposeful or adaptive. • We do not know enough about adaptation in human-computer dialog. • We may be able to use humans’ tendencies to adapt to improve outcomes for spoken dialog systems.

More Related