230 likes | 364 Views
Speaking Style Conversion. Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012. Apply VC principles to a different problem…. Speech Intelligibility Context. Speech is often heard in adverse conditions Noisy environments Listener has difficulty hearing/understanding
E N D
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012
Apply VC principles to a different problem… E.Godoy, Speaking Style Conversion
Speech Intelligibility Context • Speech is often heard in adverse conditions • Noisy environments • Listener has difficulty hearing/understanding • How to transform speech to make it more intelligible…? • To make speech synthesis systems more effective noise no noise Example of speech with environmental barriers: the speech is not very intelligible! E.Godoy, Speaking Style Conversion
Intelligible Speaking Styles • Lombard speech • Speaker is immersed in noise • Human reflex to increase the speech loudness • Clear speech • Listener faces barrier (noise, hearing, language,…) • Speaker adapts strategy to increase speech clarity normal Lombard casual clear E.Godoy, Speaking Style Conversion
VC to improve speech intelligibility? • Voice Conversion • Modify speech to change the speaker identity • Learn transformation from source-to-target speaker • Speaking Style Conversion • Modify speech to improve intelligibility • Determine transformation from normal-to-intelligible style • Spectral Envelope: still very important! E.Godoy, Speaking Style Conversion
Overview: Analyses-to-Modifications • Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles • Average Spectra • Vowel Spaces • Result of analyses inspire spectral modifications to improve intelligibility • Spectral energy band boosting (corrective filters) • Formant shifting (frequency warping) E.Godoy, Speaking Style Conversion
Corpora • Lombard-normal: Grid • 8 speakers (4 male, 4 female) • 50 sentences each • LombardNinf96: most extreme (Lu & Cooke) • Clear-casual: LUCID read sentences • 8 speakers (4 male, 4 female) • 50 sentences each • Read speech: most exaggerated (Baker & Hazan) E.Godoy, Speaking Style Conversion
Average Relative Spectra • Recall Amplitude Scaling in DFWA • Average Relative spectra is similar: • difference between normal (X) and intelligible (Y) style • Average across all frames E.Godoy, Speaking Style Conversion
Average Relative Spectra (by Speaker) Clear-casual Lombard-normal E.Godoy, Speaking Style Conversion
Average Relative Spectra (Overall) • Lombard speech: Spectral energy boosting “where formants are” (~500-4500Hz) • Clear speech: Varies depending on speaker strategy, extent of differences mild overall E.Godoy, Speaking Style Conversion
Vowel Spaces (average for all speakers) • Lombard speech: Vowel Space Translation • Clear speech: Vowel Space Expansion E.Godoy, Speaking Style Conversion
Inspiration for Speech Modifications • Spectral energy band boosting (Lombard) • Vowel space expansion (Clear) • Features attributed with increased speech intelligibility • Though not observed together in human speech production… • Signal processing algorithms can accomplish both! E.Godoy, Speaking Style Conversion
Spectral Energy Band Boosting • Corrective Filters Lombard-inspired & Enhanced (high SII) Corrective Filter: Varying Gain E.Godoy, Speaking Style Conversion
Frequency Warping for VS Expansion • Curve fitting formant shifts inspires warping… E.Godoy, Speaking Style Conversion
Sound Samples With Noise (SSN, 0dB) • Original • Warp • Boost • BW No Noise • Original • WarpE • Boost • BW E.Godoy, Speaking Style Conversion
Want more ? • See Maria’s presentation for more details … E.Godoy, Speaking Style Conversion
Voice & Speaking Style Conversion Parallels • Voice Conversion • Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers) • Speaking Style Conversion • Frequency Warping + Corrective Filter • Clear-speech inspired frequency warping for vowel space expansion • Lombard-speech inspired corrective filters to increase loudness E.Godoy, Speaking Style Conversion
Thank you! More Questions?
Objective Metrics for Evaluation • Loudness • Energy in frequency bands weighted based on human hearing • Speech Intelligibility Index (SII) • Energy & modulations in frequency bands relative to a noise masker E.Godoy, Speaking Style Conversion
Loudness Distributions • Lombard speech: “louder” for voiced (bi-modal) • Clear speech: not “louder” than casual speech • Transients: neither style distinguishes on average E.Godoy, Speaking Style Conversion
Extended SII Distributions • extSII highly correlated with ave loudness • Lombard speech objectively more intelligible • Clear speech intelligibility gain not captured by extSII • limitations of objective intelligibility metrics E.Godoy, Speaking Style Conversion
Observations from Analyses • Lombard Speech • Spectral boosting in inclusive formant region • Increase in Loudness (also extSII) • Vowel space translation, but no expansion • Clear Speech • Small changes in average spectra (slight spectral “flattening”) • Consistent vowel space expansion • Greater vowel discrimination • Comparison between styles • Acoustic differences • translate into perceptual distinctions • linked to intelligibility gains • Spectral boosting & Vowel space expansion: mutually exclusive E.Godoy, Speaking Style Conversion