160 likes | 372 Views
ISCA Tutorial and Research Workshop on “Voice Quality: Functions, Analysis and Synthesis” Geneva, Switzerland, 27-29 August 2003. VOQUAL-2003. Parameterisation of Glottal Waveforms for Characterisation of Laryngeal Voice-Quality. Parham Mokhtari Hartmut Pfitzinger & Carlos Toshinori Ishi.
E N D
ISCA Tutorial and Research Workshop on “Voice Quality: Functions, Analysis and Synthesis” Geneva, Switzerland, 27-29 August 2003 VOQUAL-2003 Parameterisation of Glottal WaveformsforCharacterisation of Laryngeal Voice-Quality Parham Mokhtari Hartmut Pfitzinger & Carlos Toshinori Ishi JST/CREST-ESP Project, HIS Labs at ATR, Kyoto, Japan
Articulation Setting Phonation Quality to Articulatory domain Vocal-TractArea-Function Glottal Waveform robust mapping Formants & F0 from Acoustic domain Reliable Centres Acoustic Speech Waveform Physiologically-motivated Acoustic Analysis of Speech
Overall Muscular Tension Settings Lax Voice Tense Voice Summary of Laver’s (1980) classification of laryngeal voice qualities
Example snapshot of acoustic measurements single cycle of the glottal-flow waveform retained for further analysis
Number and Phonetic Distribution of Reliable Acoustic Measurements Total = 77 single-cycle glottal waveforms (all automatically processed, but hand-selected)
Standardised Volume-Velocity Standardised Time Prototype Glottal-flow Waveforms measured from Laver’s (1980) recording
Klatt & Klatt (1990) Rosenberg (1971) Fant, Liljencrants & Lin (1985) Parametric Models of the Glottal-flow Waveform: three well-known examples
speed of opening-phase & energy of pulse peak fundamental period PC3 7.4% PC1 57.6% Standardised Volume-Velocity PC2 23.2% PC4 3.9% pulse skew & closing speed single-peak versus diplophony Standardised Time Principal Components of Glottal-flow Waveforms across the 13 voice qualities (first 4 principal components explain 92.1% of total variance)
lax voice breathy voice modal harsh voice creaky tense voice & whispery voice tense voice falsetto harsh whispery voice Distribution of the 77 glottal waveforms in the I-II and III-IV principal component planes
64% correct Voice Quality classified as: Original Voice Quality Confusion Matrix for classification of the 77 glottal waveforms by a Decision Tree using Principal Components I, II, III and IV
Conclusions Holistic approach to modelling the glottal-flow waveform Underlying basis-functions found by empirical analyses Top 4 principal components explain over 90% of variance Top 4 PCs can distinguish among 13 voice qualities Future Work Even more robust methods needed for fully automatic analysis Extension to spontaneous, conversational, expressive speech! Spectral and perceptual correlates of principal components…? Our approach can adapt to a wide variety of phonation-types Voice-quality control in speech synthesis!
End of Presentation – Thank You –
original more breathy more harsh lower AQ values higher AQ values
Estimated glottal-flow waveforms… 1 – Breathy phonation ~ “effective decay time” (Fant et al., 1994) 2 – Pressed phonation Definition of the Glottal AQ (Amplitude Quotient) -- figures taken from Alku et al. (JASA, August 2002) -- AQ = fac / dpeak = T2 Stylised, triangular glottal-flow waveform glottal-flow waveform glottal-flow derivative
Pressed/Harsh phonation Breathy/Nasal phonation Glottal waveform (72 msec) Glottal waveformderivative Spectrum of glottal waveform [0, 8] kHz Contrasting phonation-types – glottal AQautomatically measured at reliable centres in continuous speech • abrupt glottal closure • negative-peak of derivative • higher harmonics prominent • low AQ • quasi-sinusoidal glottal wave • smooth derivative-waveform • fundamental most prominent • high AQ