1 / 45

Understanding Variation of VOT in spontaneous speech

Understanding Variation of VOT in spontaneous speech. Yao Yao UC Berkeley yaoyao@berkeley.edu. Overview. Background Methodology Data Preliminary analysis Regression model Results Discussion. Overview. Background Methodology Data Preliminary analysis Regression model Results

csilla
Download Presentation

Understanding Variation of VOT in spontaneous speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Variation of VOT in spontaneous speech Yao Yao UC Berkeley yaoyao@berkeley.edu

  2. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  3. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  4. Background • Keywords • VOT • Variation • Spontaneous speech • VOT (Voice Onset Time) • The duration of time between consonant release and the beginning of voicing of the next vowel • Sensitive to speaker and speaking environment close release vowel onset

  5. Background • What conditions length of VOT? • Place of articulation (POA) • VOT increases as POA moves backward, i.e. [p]<[t]<[k] • Following vowel • Speaking rate • Age, gender • Dialectal background • Speech disorders • Lung volume • Hormone level • …

  6. Background • Why using spontaneous speech data? • Previous results are mostly based on experimental data or read speech. • The existence of large-scale transcribed speech corpora makes it possible to study patterns with “naturalistic” data. (Cf. Bell et al. 1999, Gahl in press, Raymond et al. 2006, etc)

  7. Background • Experimental data • Controlled content • Easy to investigate individual factors • Hard to see the general pattern of variation • Not necessarily natural speech • Spontaneous data • Uncontrolled content • Need to statistically control for irrelevant factors • Provides a general picture of variation • More naturalistic. Include factors such as disfluency

  8. Background • Purpose of this study • To investigate some of the factors that have been shown to affect VOT in experiments, as well as those that have been proposed to influence spontaneous speech production • Main statistical tool • Linear regression • Adding variables step by step

  9. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  10. Data • Buckeye corpus (Pitt et al. 2005) • 40 speakers • All residents at Columbus, Ohio • Balanced in age and gender • 1-hr interview • Transcribed at word and phone level • 19 speakers’ transcriptions were available at the time of this study

  11. Data • 2 speakers’ data are used for this study • F07: Older, female, low speaking rate (4.022 syllables/sec) • M08: Younger, male, high speaking rate (6.434 syllabes/sec) • Target tokens • word-initial transcribed voiceless stops (i.e., [p], [t], [k])

  12. Data • Finding point of burst • An automatic algorithm is used first. (cf. Yao 2007) • >70% of the tokens are checked manually. Error <3.5 ms. • Some tokens are rejected by the algorithm for not having significant burst point.

  13. VOT by speaker F07: Mean = 57.41ms, SD = 26.00ms M08: Mean = 34.86ms, SD = 19.82ms

  14. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  15. Preliminary analysis: POA VOT by POA in F07 VOT by POA in M08 p t k p t k

  16. Preliminary analysis: Word class • Split the data set into three subsets • Content words • Function words • Other. (e.g. proper names)

  17. Preliminary analysis: Word class VOT by word class in F07 VOT by word class in M08 function content other function content other

  18. Preliminary analysis: word class • Word class distinction or general effect of frequency?

  19. Preliminary analysis: word frequency • Two frequency measures: • Log of Celex frequency • Log of Buckeye frequency (speaker-specific) • The two measures are highly correlated (r=0.826) • Effect: more frequent words have shorter VOT

  20. Word class vs. frequency • After factoring out the effect of word class, frequency is no longer significant in F07’s data (p=0.277), but still in M08’s data (p=0.003) • This suggests that the above frequency effect in F07 is mainly due to the effect of word class. In other words, we need to factor out the effect of word class if we really want to study the effect of frequency.

  21. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  22. Linear regression model • We decide to only model the variation in the content word set • F07: 155 tokens • M08: 346 tokens • Factors investigated • POA • Word frequency • Phonetic context • Speech rate • Utterance position

  23. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  24. Regression: POA • The canonical rule of [p] <[t] <[k] is only shown in M08’s data, not in F07’s data.

  25. Regression: word frequency • In both speakers’ data, more frequent words tend to have shorter VOT, but the trends are not very significant. • For both speakers, Buckeye frequency measure is slightly better than Celex frequency measure.

  26. Regression: word frequency F07 M08

  27. Regression: phonetic context • Two measures • Category of the previous phone • Coded as C(onsonant), V(owel), O(other sound), and N(on-linguistic) • Category of the next phone • Coded as C(onsonant), V(owel), O(other sound), and N(on-linguistic)

  28. Regression: Phonetic context F07 M08

  29. Regression: phonetic context VOT by previous phone category in F07 VOT by next phone category in M08

  30. Regression: speech rate • Three speed measures • Duration of the next phone, in ms. • Average speed of a 3-word period centered at the target word, measured in # of syll/s. • Average speed of the pause-bounded stretch that contains the target word, measured in # of syll/s. • All speed measures predict that words in faster speech tend to have shorter VOT

  31. Regression: speech rate F07 M08

  32. Regression: utterance position • Utterance-final lengthening has been documented in the literature extensively. • We code tokens for whether they are followed by silence.

  33. Regression: utterance position F07 M08 non-final final non-final final

  34. Regression: utterance position F07 M08 Utterance position contributes to the variation in VOT Utterance position doesn’t contribute to the variation in VOT

  35. Regression: complete model F07 M08

  36. Regression: trends observed • POA • [p]<[t]<[k] • Word class • function words < content words • Word frequency • ??Higher frequency  shorter VOT

  37. Regression: trends observed • Phonetic category • ??Preceded by vowel  shorter VOT • ??Followed by vowel  longer VOT • Speaking rate • Faster speech  shorter VOT • Utterance position • Utterance final  longer VOT

  38. Regression: trends observed • Missing from the picture • Contextual predictability • Stress • Disfluency • Emotion

  39. Overview • Background • Methodology • Data • Preliminary analysis • Regression model • Results • Discussion

  40. Discussion • Individual differences • Factors • Measurements • Other between-subject factors • Age • Gender • Average speaking rate

  41. Discussion • Relatively little variation is explained in the full model. (19.11% in F07 and 16.62% in M08) • Factors missing from the picture: contextual predictability, stress, disfluency, etc. • Limitation of linear regression model • Non-linear effect • Non-homogeneous effect • Mixture of categorical and continuous variables

  42. Discussion • Echoing and challenging previous findings • VOT and POA • Canonical rule is observed in M08, but not in F07 • Word frequency effect • Overshadowed by word class distinction • Utterance-final lengthening • Significant in F07, but not M08 • Speaking style? • Content words vs. function words? • Speed measures?

  43. Conclusion Still a long way to go to model VOT variation in spontaneous speech… Thanks! Any comments are welcome!

  44. Thanks to • Anonymous subjects • Contributors to the Buckeye corpus • Prof. Keith Johnson • Members of the phonology lab in UC, Berkeley

  45. Selected references • Bell, A. et al. (1999) Forms of English function words - Effects of disfluencies, turn position, age and sex, and predictability. Proceedings of ICPhS-99 • Gahl, S. In press. "Time" and "thyme" are not homophones: The effect of lemma frequency on word durations in a corpus of spontaneous speech. To appear in Language. • Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95 • Raymond et al. (2006) Word-internal /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. • Yao, Y. (2007) Closure duration and VOT of word-initial voiceless plosives in English in spontaneous connected speech. UC Berkeley PhonLab report

More Related