230 likes | 375 Views
Modeling formant transition patterns in VCV sequences. Martine Toda * ** Shinji Maeda ** * Laboratoire de Phon étique et Phonologie, Université Paris III / CNRS UMR 7018 ** TELECOM ParisTech / CNRS LTCI UMR 5141. Background. Subject-dependent strategies in French / ʃ /.
E N D
Modeling formant transition patterns in VCV sequences Martine Toda * ** Shinji Maeda ** * Laboratoire de Phonétique et Phonologie, Université Paris III / CNRS UMR 7018 ** TELECOM ParisTech / CNRS LTCI UMR 5141
Subject-dependent strategies in French /ʃ/ Sublingual cavity /ʃ/ Palatalized /ʃ/ Palatal channel Sagittal contour tracingsfrom MRI /ʃ/ /s/ /s/ /ʃ/ Lip protrusion Sublingual cavity Frication noise spectrum 12 kHz 12 kHz Cf. Toda, JEP 2006
Palatal ʃ Sublingual cavity ʃ F2 and F3 target for [ʃ] in French subjectsand their direction in [ʃa] transitionAfter Toda, JEP 2006
Research question • What are the [ʃ] and [a] vocal tract configurations that permit to replicate the observed formant transition patterns in French [aʃa] sequences ?
Target [aʃa] utterances • Fr4, French male subject • Sublingual cavity /ʃ/ in /aʃa/ • Fr5, French male subject • Palatalized /ʃ/ in /aʃa/ Sublingual cavity Palatalized
Vocal tract synthesis of [aʃa] sequences VCVsynt program - cf. Maeda (1984) time • Presence/absence of noise source conditioned by the ratio between glottal and supraglottal constriction area Section-by-section interpolation of the area functions Fricative target Vowel target Image synchro.m [a] [a] [ʃ]
Models of vocal-tract targets for [ʃ] (a) Branching [ʃ] model with a sublingual cavity (b) Palatalized [ʃ] model with a palatal channel [s] [ʃ] [ʃ] [s] Sublingual cavity(side branch) Palatal channel * * glottis lips glottis lips Pressure source
Length & ratio adjustment of the target VT area function of [a] for individual speakers • Subject-specific articulatory space relevant in the vocal-tract shape of vowels (cf. Honda et al. ICSLP 1996) H V Male Female Japanese /sj/
Length & ratio adjustment of the target VT area function of [a] for individual speakers • Total length varying from 13 to 20 cm • Back/front ratio varying from 0.3 to 1.7 • Constant laryngeal cavity • Narrowing at lip region (inspired from the articulatory models) • Gradual transition between the back and front tubes Female-like Male-like Distance from glottis (mm) Distance from glottis (mm)
Best fit Length = 150 mmback:front length ratio = 0.7:1 Optimal [a] for 7 French subjects (6 male, 1 female ) Finding the appropriate [a] configuration for each subject Subject Fr4 Back/front ratio Model - utterance distance (ΔHz) calculated over F1 through F3 VTlength = 17 cm Ratio = 1.7 Ratio = 0.3 VTlength = 13 cm VTlength Square root ( (F1calculated-F1measured)^2+ (F2calculated-F2measured)^2+ (F3calculated-F3measured)^2 ) • The length of the fricative model is adjusted accordingly by lengthening or shortening the back cavity.
[a] back/front ratio VT length Overview of the results Syntheses with a sublingual cavity [ʃ] Syntheses with a palatalized [ʃ] F3 F3 F2 F2 F3 F3 F2 F2 F3 F3 F2 F2
Best fit Natural speech Sublingual cavity Palatalized F2 rising higher towards the fricative
0 Vocal tract shape during [aʃa]transitions Sublingual cavity [ʃ] Fr4 [a] Sublingual side branch Palatalized [ʃ] Fr5 [a] glottis lips [a] [ʃ] [ʃ]
… … Application to speech inversion target target target … Vocal tract Vocal tract Vocal tract Solution 1Solution 2Solution 3… Transformation matrix […]
Summary • The vocal tract synthesis of [aʃa] was performed in the perspective of replicating the subject-related variation of formant transition patterns. • The results produced a variety of formant transition. • The transitions are determined by the combination of [ʃ] and [a] vocal-tract configurations.
Summary • F2 transition is sensitive to the [ʃ] configuration. • Palatalized [ʃ]: higher F2 onset • The direction of higher formants depend on both [ʃ] and [a] in simulation conditions. • However, it is not certain that all of the combinations actually occur in real speech.
Further questions • Which factors are conditioning the individual articulatory strategy ? • Lower articulatory cost ? Optimal configuration to fit A’s acoustics Δʃ-a Articulatory modelSubject A Rescaling to fit subject A’s vocal tract Optimal configuration to fit A’s acoustics Δʃ-a Optimal configuration to fit B’s acoustics Rescaling to fit subject B Δʃ-a Articulatory modelSubject B Optimal configuration to fit B’s acoustics Δʃ-a
Acknowledgement • The authors acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Sixth Framework Programme for Research of the European Commission, under FET-Open contract no. 021324.