1 / 14

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments. 張智星 Jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang. Reference.

faith
Download Presentation

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星 Jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang

  2. Reference • Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

  3. Summary • Entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments • Better than energy-based algorithms in both detection accuracy and recognition performance • Error reduction: 16%

  4. Motivation • Energy-based endpoint detection becomes less reliable when dealing with non-stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc. • Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.

  5. Spectral Entropy • PDF: • Normalization • Spectral entropy:

  6. N=2 entropyPlot.m N=3 Properties of Entropy

  7. Entropy Weighting • A set of weighting factors can be applied: • These weighting factors are statistically estimated from a large collection of speech signals.

  8. Endpoint Detection • The sum of the spectral entropy values over a duration of frames (20 frames) is first evaluated and smoothed by a median filter • Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments • A short period of background noise is first taken as the reference for some initial boundary detection process. • Short speech segments (<100ms) are rejected.

  9. Experiment Settings • Speech database • Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training) • Speech features: 12-order MFCC and 12-order delta MFCC • Models • Continuous-density HMM • 6 states/digits, 3 mixture/state

  10. Experiment Settings • Noise • NOISEX-92 noise-in-speech database • White noise, pink noise, volvo noise (car noise), F16 noise, machinegun noise • Sound artifacts • Breath noise, cough noise and mouse click noise.

  11. Example

  12. Experimental Results

  13. Experimental Results

  14. Something Not Clear… • What is the sample rate? Bit resolution? • What is the frame size and overlap? • What is the order of the median filter? • How to use the “short period of background noise”? • What is the value for the thresholds of spectral entropy for determining boundaries? • What are the values for d1 and d2?

More Related