Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星 Jang@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~jang

Reference • Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

Summary • Entropy-based algorithm for accurate and robust endpoint detection for speech recognition under noisy environments • Better than energy-based algorithms in both detection accuracy and recognition performance • Error reduction: 16%

Motivation • Energy-based endpoint detection becomes less reliable when dealing with non-stationary noise and sound artifacts such as lip smacks, heavy breathing and mouth clicks, etc. • Spectral entropy is effective in distinguishing the speech segments from the non-speech parts.

Spectral Entropy • PDF: • Normalization • Spectral entropy:

N=2 entropyPlot.m N=3 Properties of Entropy

Entropy Weighting • A set of weighting factors can be applied: • These weighting factors are statistically estimated from a large collection of speech signals.

Endpoint Detection • The sum of the spectral entropy values over a duration of frames (20 frames) is first evaluated and smoothed by a median filter • Some thresholds are used to detect the beginning and ending boundaries of the embedded speech segments • A short period of background noise is first taken as the reference for some initial boundary detection process. • Short speech segments (<100ms) are rejected.

Experiment Settings • Speech database • Isolated digits in Mandarin Chinese produced by 100 speakers (10 speakers for test, others for training) • Speech features: 12-order MFCC and 12-order delta MFCC • Models • Continuous-density HMM • 6 states/digits, 3 mixture/state

Experiment Settings • Noise • NOISEX-92 noise-in-speech database • White noise, pink noise, volvo noise (car noise), F16 noise, machinegun noise • Sound artifacts • Breath noise, cough noise and mouse click noise.

Example

Experimental Results

Something Not Clear… • What is the sample rate? Bit resolution? • What is the frame size and overlap? • What is the order of the median filter? • How to use the “short period of background noise”? • What is the value for the thresholds of spectral entropy for determining boundaries? • What are the values for d1 and d2?

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Presentation Transcript

Distinctive Feature Detection For Automatic Speech Recognition

Landmark-Based Speech Recognition

Robust Speech recognition

ROBUST SIGNAL REPRESENTATIONS FOR AUTOMATIC SPEECH RECOGNITION

Advanced Speech Enhancement in Noisy Environments

Robust Recognition of Emotion from Speech

Distant Speech Recognition in Smart Homes Initiated by Hand Clapping within Noisy Environments .

Quantile Based Histogram Equalization for Noise Robust Speech Recognition

Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition

Histogram-based Quantization for Distributed / Robust Speech Recognition

Distant Speech Recognition in Smart Homes Initiated by Hand Clapping within Noisy Environments .

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION

Enhanced Speech Models for Robust Speech Recognition

Speech Recognition in Adverse Environments

A Study on Detection Based Automatic Speech Recognition

Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition

Detection and Segmentation of Bird Song in Noisy Environments

Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Prosodic Constraints for Robust Speech Recognition

A Feature Weighting Method for Robust Speech Recognition

Landmark-Based Speech Recognition