310 likes | 555 Views
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise. Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005.
E N D
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR • Two phase in most ASR systems • Train • Operating (Testing) • Mismatch causes reduction in accuracy • Mismatch occur because of • Environment • Microphone, babble, distance, transmission canal • Speaker • Specific speaker: speed,… • Various speakers: gender, age, accent,… Computer Engineering Department Sharif University of Technology
noise Non-stationary Stationary Effect of Noise on ASR • Noise • Additive noise • Babble, car, subway • Exhibit, office, … • Convolutional Noise • Canal, telephone line • Microphone effect • Distance of speaker to microphone • Others • Lombard noise, Reflection of building Computer Engineering Department Sharif University of Technology
Convolutional noise Corrupted Speech Clean Speech Additive noise Effect of Noise on ASR • Simple model • Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Computer Engineering Department Sharif University of Technology
Feature Extraction Model Training Speech Signal Features Model Training phase Speech Signal Features Model Testing phase Robustness Methods • Signal • Speech enhancement • Feature • Robust feature extraction • Model • Change of the model parameters • Model training Computer Engineering Department Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology
Mel-Frequency Cepstral Coefficient • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Take log of outputs ( for RCC we take root instead of log) • Compute cepstral using discrete cosine transform • Smooth by dropping higher-order coefficients Computer Engineering Department Sharif University of Technology
Temporal processing • To capture the temporal features of the spectral envelop; to provide the robustness: • Delta Feature: first and second order differences; regression • Cepstral Mean Subtraction: • For normalizing for channel effects and adjusting for spectral slope Computer Engineering Department Sharif University of Technology
Perceptual Linear Prediction (PLP) • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Apply compressive nonlinearities • Compute discrete cosine transform • Smooth using autoregressive modeling • Compute cepstral using linear recursion Computer Engineering Department Sharif University of Technology
Equal Loudness Pre-Emphasis Critical Band Analysis Speech signal Find Autoregressive Coefficients Inverse DFT All pole model Intensity-Loudness Conversion PLP (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology
RelAtive SpecTral Analysis • Which makes PLP (and possibly also some other short-term spectrumbased techniques) more robust to linear spectral distortions • The new spectral estimate is less sensitive to slow variations in the short-term spectrum • Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features • This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Computer Engineering Department Sharif University of Technology
SPEECH SIGNAL SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING RASTA (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology
RASTA-PLP • Algorithm Computer Engineering Department Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology
RCC-Mean Normalization • Root Cepstral Coefficients (RCC) • Derived using root compression rather than log compression on the filterbank energies • Advantage of RCC to MFCC • More immune to noise • Faster decoding Computer Engineering Department Sharif University of Technology
RCC-Mean Normalization • Mean normalization • If we approximate root with logarithm Computer Engineering Department Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology
Experiment 1 • Database • TFARSDAT • 64 Speakers • 8 hours telephony speech data • ASR • Sharif ASR System • HMM based • Training: Segmental K-means • Search: Beam Viterbi Computer Engineering Department Sharif University of Technology
Test results Experiment 1 Computer Engineering Department Sharif University of Technology
Experiment 2 • Aurora 2.0 • Noisy connected digits recognition • 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions • HTK • HMM based • Model for each digit • 16 states with 3 Gaussian mixtures Computer Engineering Department Sharif University of Technology
Experiment 2 • Average results on AURORA • Average obtained on various SNRs of a noise Computer Engineering Department Sharif University of Technology
Experiment 2 • Subway noise in various SNRs Computer Engineering Department Sharif University of Technology
Experiment 2 • Babble noise in various SNRs Computer Engineering Department Sharif University of Technology
Experiment 2 • Car noise in various SNRs Computer Engineering Department Sharif University of Technology
Experiment 2 • Exhibition noise in various SNRs Computer Engineering Department Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology
Summery • Various robust features was tested • Introduce of RCC_MN • In first experiment • RASTA-PLP • Although RCC_MN is good • In second experiment • RCC_MN Computer Engineering Department Sharif University of Technology
Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology