1 / 27

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL. Preeti Rao and Pushkar Patwardhan. Department of Electrical Engineering, Indian Institute of Technology, Bombay India. The MBE Speech Model (Griffin & Lim, 1988). Modeled. Original. MBE modeling. X.

keane-byers
Download Presentation

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering, Indian Institute of Technology, Bombay India

  2. The MBE Speech Model(Griffin & Lim, 1988) Modeled Original MBE modeling X Department of Electrical Engineering, IIT Bombay

  3. Frame-based analysis Within the window, assume: a constant–amplitude, constant-frequency sinusoidal model Department of Electrical Engineering, IIT Bombay

  4. Windowed speech Parameter Estimation Band-wise voicing decisions Pitch Harmonic amplitudes MBE Speech Model Parameters (Phase is predicted for smoothness) Department of Electrical Engineering, IIT Bombay

  5. MBE Analysis: Parameter Estimation Pitch and Spectral Amplitudes: Analysis-by-synthesis matching of a predicted harmonic spectrum with the actual signal spectrum. Voicing decision per frequency band (3 harmonics): Based on the error between the actual and predicted spectra. Department of Electrical Engineering, IIT Bombay

  6. MBE Analysis: Spectral Matching Voicing thresholds are frame-adapted as determined by experimental tuning. Department of Electrical Engineering, IIT Bombay

  7. Bank of Harmonic Oscillators Voiced amplitudes Voiced speech Pitch Voiced speech synthesis Voiced speech Reconstructed speech Unvoiced speech Unvoiced amplitudes Linear Interpolation Weighted Overlap-Add White noise STFT Replace Envelope Unvoiced speech Unvoiced speech synthesis MBE Synthesis Department of Electrical Engineering, IIT Bombay

  8. Narrowband Speech Coding with MBE The efficient quantisation of MBE parameters has led to: IMBE (Inmarsat) @ 4.15 kbps DVSI MBE codecs @ >2kbps LR MBE (IITB) @ 1.5 kbps Research groups: (Univ. Surrey, UCSB, Sony Corp.)@1.2 kbps to 3 kbps reference modeled Department of Electrical Engineering, IIT Bombay

  9. Related Models: Speech Synthesis • Harmonics+Noise Model (HNM): Stylianou • Harmonic/Stochastic Model (H/S): Dutoit,1996 Emphasis is on natural sounding wideband speech and easy prosody modification. Both use essentially the Griffin & Lim MBE analysis. Important differences: • Analysis and synthesis are pitch synchronous • Estimated harmonic phases are utilised in synthesis Department of Electrical Engineering, IIT Bombay

  10. MBE Model: Limitations The codec speech quality does not improve with increasing bit rate => the model has its limitations Assumption of frame-level quasi-stationarity: enables the accurate representation only of • vowels • unvoiced and voiced fricatives (not plosives, onsets,…) Department of Electrical Engineering, IIT Bombay

  11. “Steady” Sounds: Voice Quality Pitch cycle variations: Jitter / shimmer (roughness / harshness) T1 T2 Tm Speech signal + Vocal tract response Frication and aspiration (friction, breathiness) Glottal pulse Glottal pulse shape variation (brightness, vocal effort) sharp dark Department of Electrical Engineering, IIT Bombay

  12. Role of Model Excitation Parameters The glottal spectral shape (glottal waveform shape) can be captured by the spectral envelope parameters. But the perceptual effects of • vocal cord vibration aperiodicities • aspiration / frication noise must be reproduced (if at all) by the MB excitation. Department of Electrical Engineering, IIT Bombay

  13. Effect of Aperiodicities on MBE Parameters Voice source aperiodicities distort the harmonic spectrum (esp. if the frame contains several pitch cycles). • Modulation (jitter-shimmer) aperiodicities => smearing of harmonic lobe structure; noise and subharmonics may be introduced. • Aspiration noise => additive noise in harmonic regions Department of Electrical Engineering, IIT Bombay

  14. MBE Analysis: Aperiodic Vowel Increase in the analysis spectrum matching error => MBE synthesis of UV (random noise) frequency bands Department of Electrical Engineering, IIT Bombay

  15. Previous: On Multi-band Excitation • Fujimura, 1968: “A crude approximation of aperiodicity observed in natural speech can be made by distributing patches of random noise signals in the time-frequency space of the speech signal.” • Makhoul, 1978: “Spectral devoicing due to vocal cord vibration irregularities is an artifact of the spectral estimation, and it may not be appropriate to use a noise source for the synthesis…” • Griffin and Lim, 1988: Justify MBE model by quoting Fujimura, and also their own observations with speech in noise. Department of Electrical Engineering, IIT Bombay

  16. Synthetic Vowel: Modulation Aperiodicities Department of Electrical Engineering, IIT Bombay

  17. Synthetic Vowel: Modulation Aperiodicities Periodic ref: 80 Hz 160 Hz 250 Hz HIGH JITTER HIGH SHIMMER Department of Electrical Engineering, IIT Bombay

  18. Fujimura-type Experiment MBE (note “unfused” noise) Highly jittered vowel /ɑ/ Reference MBE-modeled with forced decisions Department of Electrical Engineering, IIT Bombay

  19. Experiments with Natural Speech Goal: to study the MBE representation of • Unvoiced and voiced fricatives • Breathy voice • Rough and hoarse voices • Speech in noisy background To understand the implications of simplifying the excitation to single-band (SBE) or two-band excitation (TBE) Department of Electrical Engineering, IIT Bombay

  20. VCV: /ɑzɑ/ SBE modeled Reference MBE-Modeled Department of Electrical Engineering, IIT Bombay

  21. VCV: /ɑƷɑ/ Reference MBE-modeled Department of Electrical Engineering, IIT Bombay

  22. Voice quality: Breathy MBE-modeled TBE-modeled (buzzy) Reference Department of Electrical Engineering, IIT Bombay

  23. Voice Quality: Harsh MBE-modeled TBE-modeled Reference Department of Electrical Engineering, IIT Bombay

  24. Voice Quality: Rough MBE-modeled Reference Department of Electrical Engineering, IIT Bombay

  25. Noise Corrupted Speech (15 dB SNR) MBE-modeled TBE-modeled (buzzy) Reference Department of Electrical Engineering, IIT Bombay

  26. Conclusions • MB excitation represents frication and aspiration accurately; esp. crucial for noisy speech. • Modulation aperiodicities are not captured at high pitches except through devoiced bands. Depending on the setting of thresholds, the noise bands may not fuse perceptually. • It is possible to simulate partially the perceptual effects of jitter/shimmer by the controlled devoicing of bands in the t-f space. Department of Electrical Engineering, IIT Bombay

  27. Thank you

More Related