E N D
Physical modeling of speech and voice quality XV Pacific Voice ConferencePVSF-PIXAR3-11-05 Brad Story Dept. of Speech, Language and Hearing Sciences University of Arizona Tucson, AZ
Physical model of human sound production: A mathematical representation of the physical processes that produce the sounds of speech and song.
Filter: Air spaces created by the trachea, pharynx, oral cavity, nasal passages Trachea Primary Source: Vibration of the vocal folds creates a time-varying airflow (glottal flow). Sound Production = Combination of Sound Sources and Filters Nasal tract
Source Model: vocal fold vibration, glottal area, glottal flow, etc. F0, intensity, breathy/pressed, registers, biphonation, etc. Filter Model: acoustic wave propagation through the air spaces of the trachea, vocal and nasal tracts Formant frequencies (F1, F2, F3, …) Time variation of vocal tract shape Time variation of glottal parameters speech/song/other
Sagittal view Tubular approximation vocal folds output pressure Source: glottal area/flow Low F0 Med F0 High F0
Output sound pressure F3 Spectrogram F2 F1 F0 contour F0
Change the vocal tract shape to produce speech source filter output
Output sound pressure F3 Spectrogram F2 F1 F0 contour F0
F2 F1
Modeling Voice Quality What is voice quality? “Voice quality is conceived here in a broad sense, as the characteristic auditory coloring of an individual speaker’s voice, and not in the more narrow sense of the quality deriving solely from laryngeal activity” From “The Phonetic Description of Voice Quality” by J. Laver (1980)
Auditory coloring of an individual speaker’s voice results from: Laryngeal structure & settings Vocal Tract Structure & settings Respiratory system structure & settings Temporal control
Voice Quality Changes based on vocal tract modifications • Longitudinal: • Modification of vocal tract length • 2. Latitudinal: • Tendencies to maintain a particular constrictive (or expansive) effect on the vocal tract shape.
Modification of vocal tract length • (+ temporal modification) L= 17.5 cm
L= 23 cm L= 11 cm 60% 80% 40% 20% • Modification of vocal tract length • (+ temporal and F0 modification) L= 17.5 cm
2. Tendencies to maintain a particular constrictive (or expansive) effect on the vocal tract shape. (+ F0 contour modification) “palatized” “pharyngealized”
Beyond human voice quality… F0 = 30-50 Hz Vocal tract length = 31 cm F0 = 600-1000 Hz Vocal tract length = 4.4 cm
Singing…?? Singer’s formant
The End This work was supported by NIH R01-DC04789