1 / 39

Acoustic effects of variation in vocal effort by men, women and children

Acoustic effects of variation in vocal effort by men, women and children. Hartmut Traunmüller and Anders Eriksson assisted by Anita Andersson, Ingegerd Eklund and Jessika Rundlöv with financial support from HSFR and NUTEK for the period 94-07-01 -- 96-06-31 and from SU.

beau
Download Presentation

Acoustic effects of variation in vocal effort by men, women and children

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acoustic effects of variation in vocal effort by men, women and children Hartmut Traunmüller and Anders Eriksson assisted by Anita Andersson, Ingegerd Eklund and Jessika Rundlöv with financial support from HSFR and NUTEK for the period 94-07-01 -- 96-06-31 and from SU

  2. Acoustic properties of speech sounds vary because of linguistic - expressive - organic - perspectival factors

  3. This investigation is mainly concerned with expressive variation vocal effort mode of phonation (whispering vs. phonating). and interactions with organic variation age sex (men, women, children)

  4. Vocal effort: a subjective, physiological quantity Voice level: an acoustic quantity (SPL of a standard utterance measured at a standard distance)

  5. Alternative ways of controlling voice level: Trained speaker's/singer's technique More variation in pulmonic pressure F0 less affected Ordinary speaker's technique More variation in vocal fold tension F0 more affected

  6. Adopted definition & quantification of "vocal effort" “Vocal effort” = The quantity that ordinary speakers vary when they adapt their speech to the demands of an increased or decreased communicational distance.

  7. Adjusting "loudness level” (Holmberg, Hillman and Perkell, 1988) Shouting (Rostolland, 1982) Speaking in noise (Rastatter and Rivers, 1983; Loren, Colcord, and Rastatter, 1986; Van Summers, Pisoni, Bernacki, Pedlow, and Stokes, 1988; Bond, Moore, and Gable, 1989). Different effects of white and multitalker noise with same SPL (Rivers and Rastatter, 1985)

  8. Variation in vocal effort affects the shape of the glottal pulses (vocal fold closing velocity and relative closed interval duration) (Holmberg et al., 1988; Holmberg, Hillman, Perkell, Guiod, and Goldman, 1995; Södersten, Hertegård and Hammarberg, 1995). ... reflected in the spectral emphasis of the partials above the first (Gauffin and Sundberg, 1989; Childers and Lee, 1991; Granström and Nord, 1992)

  9. Variation in vocal effort affects F1 (Frøkjær-Jensen, 1966; Rostolland and Parant, 1974; Schulman, 1985; Bond et al., 1989; Liénard and Di Benedetto, 1999). F1 difficult to measure but more open mouth >> higher F1

  10. Variation in vocal effort affects segment durations (Fónagy and Fónagy, 1966, Rostolland, 1982, and Bonnot and Chevrie-Muller, 1991) larger effort: longer vowels but somewhat shorter consonants

  11. SPL as a measure of vocal effort? (Liénard and Di Benedetto, 1999) SPL plays no major part in judgments of vocal effort (Rundlöf, 1996; Traunmüller, 1997; Eriksson and Traunmüller, 1999) SPL varies widely as a function of perspectival factors. Listeners distinguish variations in a speaker’s vocal effort from variations in their own distance from the speaker. (Wilkens and Bartel, 1977, Eriksson and Traunmüller, 1999)

  12. Our measure of vocal effort: The average rating, by a group of listeners, of the communicational distance for each stimulus. Our aim: Acquire detailed quantitative knowledge about those acoustic effects of variations in vocal effort that are of perceptual importance.

  13. Relevant to: Speech synthesis with desired paralinguistic quality Automatic recognition of linguistic information Automatic recognition of expressive information Automatic recognition of organic information Conversion of paralinguistic quality Automatic speech-to-speech translation with conserved paralinguistic quality Theories of speech: The Modulation Theory

  14. Subjects 6 male adults, 20–51 years 6 female adults, 20–38 years 4 boys, 7 years 4 girls, 7 years all speaking Stockholm Swedish

  15. Speech material Anita: “Hur många kort tog du av varje färg?” Jag tog ett violett, åtta svarta och sex vita [] 5 phonated and 2 whispered versions

  16. Recording Place: Långängen, Lidingö DAT-recorder High quality microphone, wind protected, 50 mm from speaker's lips Stepwise attenuator 0, 8, 16, 24, and 32 dB Sampling at 16 kHz, 16 bits per sample HP-filtering at 70 Hz, 48 dB/octave ESPS/Waves For formant frequency measurements resampled at 6.4 kHz for men, 8 kHz for women and 10.667 kHz for children.

  17. Table I. Distances between speaker and addressee. The full range was used for phonated speech. Whispered speech was only used at the two shortest distances. Version 1 2 3 4 5 Distance (m) 0.3 1.5 7.5 37.5 187.5

  18. Acoustic measurements Sound pressure levels SPLV (voiced segments & potentially voiced) SPLS (three [s]-es) SPL0 (voiced segments LP filtered at 1.5 F0mean, 18 dB/oct.) Spectral emphasis SPLV - SPL0 Fundamental frequency F0 (mean and SD, excl. creaky voiced sections) Formant frequencies F1a (average of four [a]-s) F3 (average of voiced segments & potentially voiced) Segment durations durV (average of 14 vowels, 3 [v] and 1 [j]) durC (average of 8 stops, 3 [s] and 1 [l])

  19. The measure of vocal effortExp. 1 Exp. 2 20 listeners 20 listenersphonated utterances phonated utterances original SPL SPL random +/- 6 dB Geometric means of distances in meters Real 0.375 1.5 7.5 37.5 187.5Estimanted 0.47 0.69 1.9 7.5 31Exp. 2 (dep.) vs. Exp. 1 (indep.): r = 0.993, slope = 0.93. Estimated (dep.) vs. real distance (indep.): r = 0.90Rundlöf J. (1996). Perceptuella ledtrådar vid auditiv bedömning av avståndet mellan talare och lyssnare D-uppsats, lingvistik, SU.

  20. Extrinsic factors (1) Communicational distance 2log(distance in meters) (2) “Closeness" e(1-n) (see Fig. 1) (3) Wind noise (wind velocity in m/s) (4) Speaker age: 2log(age in years), (5) Boyhood (1, 0) (6) Manhood (1, 0) (7) Speaker-specific constants (speaker specific average prediction error)

  21. FIG. 1. The average sound pressure level (SPLv), with an arbitrary reference, of the voiced and potentially voiced segments in the phonated and whispered utterances produced by men (), women (), boys (), and girls ().

  22. FIG. 2. The contribution of the environmental and speaker specific factors (1) communicational distance, (2) “closeness” (3) wind noise, (4) speaker age, (5) boyhood, (6) manhood, and (7) speaker-specific constants, to the variation in acoustic variables measured in the phonated utterances. These variables were (from left to right) SPLv, SPL0, spectral emphasis (SPLv–SPL0), SPLs, utterance average F0, F1a, F3, and the durations of vowel–like (durV) and consonantal segments (durC.).

  23. Sound pressure levels The dependent variables were SPLv, SPL0, spectral emphasis (SPLv–SPL0), and SPLs, for all of which the effect is expressed in dB.

  24. Table III. Occurrence of creaky voice, in % of the total duration of the voiced segments.

  25. F0 and formant frequencies The dependent variables were F0, F1 of the [a]-segments, and F3 of the voiced segments, for all of which the effect is expressed as a factor.

  26. Table IV. Mean values and standard deviations of F0 as a function of distance. Standard deviations also expressed in semitones.

  27. Segment durations The dependent variables were the durations of vowel-like (durV) and consonantal segments (durC), for which the effect is expressed as a factor.

  28. Table V. The mean pausing time, in ms, in all phonated and whispered utterances after the word listed in the first column. FIG. 3. The mean of the total pause duration (in ms) in phonated and whispered utterances shown as a function of the communicational distance for men (), women (), boys (), and girls ().

  29. FIG. 4.SPLv (above), SPLs (middle), and the spectral emphasis SPLv–SPL0 (below) shown as a function of vocal effort level VEL = 2log(d), where d is the perceived communicational distance in meters. Regression lines fitted to the whole set of data for SPLv and emphasis, and to those obtained from each speaker group, men (, solid lines), women (, broken), boys (, dashed), and girls (, dotted) for SPLs.

  30. Fig. 5. Mean values of F0, F1a, and F3, shown as a function of VEL for men (), women (), boys (), and girls (). Regression lines fitted to each variable (solid, dotted, broken lines) and speaker group.

  31. Fig. 6. Mean values of F0 , F1 of the [a]-segments, and F3, plotted as a function of F0. Regression lines shown for each variable and speaker group, men (, solid lines), women (, broken), boys (, dashed), and girls (, dotted).

  32. For a 100% increase in F0, F1a increased by 42% for men (r = 0.90), 71% for women (r = 0.92), 95% for boys (r = 0.94), 124% for girls (r = 0.94).

  33. There is a positive correlation between F1 and F0 (large effect) in realizations of the same linguistic strings by speakers who differ in age and/or sex, and by the same speakers who alter their pitch register. “Intrinsic pitch”: a negative correlation between F1 and F0 (small effect) in vowels produced by a given speaker in the same linguistic and paralinguistic context.

  34. Increases in vocal effort involve simultaneously: > subglottal pressure ( > SPL, … ) > vocal fold tension, ( > F0, … ) > vocal tract openness ( > F1, … )

  35. Recognition of vocal effort Correlation coefficients of acoustic variables with vocal effort level (VEL) SPL0 0.95 (exceptional) SPLv 0.98 (exceptional) (SPLv–SPL0) 0.90 F0 and F3 0.87 F0, F3, and Emph 0.96 F0, F3, Emph, 2log(durV/durC) 0.97 (std.err of est. 0.64 units) Whispering [no F0, no spectral emphasis] F3, F1a, and 2log(durV /durC) 0.90

  36. Fig. 7. Mean durations of vowel-like segments (above) and consonantal segments (below) shown as a function of VEL. Locally weighted least squares regression lines fitted to the data obtained from each speaker group, men (, solid lines), women (, broken), boys (, dashed), and girls (, dotted).

  37. Table VI. Mean values and standard deviations of differences between whispered and voiced versions of the same utterance produced by the same speakers at the same communicational distance (0.3 and 1.5 m). The significance level of the difference between the age groups is also indicated.

  38. Table VII. Mean perceived and calculated distances between speaker and addressee for the phonated versions compared with distances calculated using the same equations for the whispered versions. The independent variables were F1a, F3, durV, and durC.

  39. Fig. 8. The gross difference in spectral energy distribution between whispered and phonated versions of the same utterance produced by men (), women (), boys (), and girls () at the same communicational distance (0.3 and 1.5 m), based on level measurements in frequency bands covering 3 critical bands with overlap.

More Related