1 / 13

IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding. Outline. Problem statement Proposed solution System performance Discussion.

Download Presentation

IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding

  2. Outline • Problem statement • Proposed solution • System performance • Discussion

  3. Problem Statement • Telephone Band: 300 - 3400 Hz • AM Band: 50 - 7000 Hz • How to make sound like with 500 bits/sec? • We need to recover information from both low and high-frequency bands (G.729)

  4. Proposed Solution • 1) Do our best to recover the wideband information from narrowband speech • 2) Use coding for the information that cannot be recovered • Recovered information : • Low-frequency band • High-frequency excitation • Coded information : • High-frequency spectral envelope

  5. Low-frequency regeneration Inverse IRM Filter 2 High-frequency regeneration System Overview • Inverse IRM filter is optional • produces a flat response from 200-3500 Hz 50-300 Hz band 8 kHz narrowband 16 kHz wideband 300-3400 Hz band 3400-8000 Hz band Side information

  6. Low-Frequency Regeneration (1/2) • Assumptions : • Only pitch harmonics need to be recovered • In general, no more than two pitch harmonics below 200 Hz • Absolute phase is not perceptually relevant • Frequency of harmonics determined from pitch analysis • Amplitudes determined from feed-forward multi-layer perceptron (output in log domain)

  7. 2 LP filter Low-Frequency Regeneration (2/2) Low frequencies 1st harmonic Low-frequency harmonic synthesis Scale and sum 2nd harmonic Narrowband speech Pitch delay (1) Pitch analysis Pitch gain Multi-layer Perceptron (1) (16) Scale factors MFCC calculation Cepstral coefficients

  8. 1 B(z) High pass High-Frequency Extension • Excitation-filter model (16 ms frames) • Problem is separated in two parts • Excitation extension (no side information) • Spectral envelope coding (side information) Narrowband speech High- frequency band Excitation extension A(z) Spectral envelope Extension LPC analysis Side information (High-frequency spectral envelope)

  9. 2 Absolute value Narrowband excitation wideband excitation Whitening filter High pass Excitation Extension

  10. Spectral Envelope Coding • Spectral envelope calculated from the wideband LPC coefficients • Quantization of the 3000-8000 Hz range (40 points) • Log domain • 8-bit Vector Quantization (500 bits/s side information, using 16 ms frames) • Concatenation with envelope obtained from LPC analysis on narrowband speech

  11. Objective results • Low-frequency band • 3 dB RMS error on harmonic amplitude • High-frequency band • 3.6 dB RMS error on envelope • No objective measure for excitation extension (perceptually very close to original)

  12. Subjective Results female male Original wideband Recovered from original IRM-filtered speech Recovered from G.729 coded speech

  13. Discussion • Highlights • Expand IRM-filtered telephone-band speech to AM band • Very low side information rate (500 bits/s) • Areas of improvement • Use high-band spectral estimation before coding • Use residual low-frequency information (below 300 Hz) • Noise robustness • Post-filtering

More Related