Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology

Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

Effect of Noise on ASR • Two phase in most ASR systems • Train • Operating (Testing) • Mismatch causes reduction in accuracy • Mismatch occur because of • Environment • Microphone, babble, distance, transmission canal • Speaker • Specific speaker: speed,… • Various speakers: gender, age, accent,… Computer Engineering Department Sharif University of Technology

noise Non-stationary Stationary Effect of Noise on ASR • Noise • Additive noise • Babble, car, subway • Exhibit, office, … • Convolutional Noise • Canal, telephone line • Microphone effect • Distance of speaker to microphone • Others • Lombard noise, Reflection of building Computer Engineering Department Sharif University of Technology

Convolutional noise Corrupted Speech Clean Speech Additive noise Effect of Noise on ASR • Simple model • Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Computer Engineering Department Sharif University of Technology

Feature Extraction Model Training Speech Signal Features Model Training phase Speech Signal Features Model Testing phase Robustness Methods • Signal • Speech enhancement • Feature • Robust feature extraction • Model • Change of the model parameters • Model training Computer Engineering Department Sharif University of Technology

Mel-Frequency Cepstral Coefficient • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Take log of outputs ( for RCC we take root instead of log) • Compute cepstral using discrete cosine transform • Smooth by dropping higher-order coefficients Computer Engineering Department Sharif University of Technology

Temporal processing • To capture the temporal features of the spectral envelop; to provide the robustness: • Delta Feature: first and second order differences; regression • Cepstral Mean Subtraction: • For normalizing for channel effects and adjusting for spectral slope Computer Engineering Department Sharif University of Technology

Perceptual Linear Prediction (PLP) • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Apply compressive nonlinearities • Compute discrete cosine transform • Smooth using autoregressive modeling • Compute cepstral using linear recursion Computer Engineering Department Sharif University of Technology

Equal Loudness Pre-Emphasis Critical Band Analysis Speech signal Find Autoregressive Coefficients Inverse DFT All pole model Intensity-Loudness Conversion PLP (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology

RelAtive SpecTral Analysis • Which makes PLP (and possibly also some other short-term spectrumbased techniques) more robust to linear spectral distortions • The new spectral estimate is less sensitive to slow variations in the short-term spectrum • Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features • This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Computer Engineering Department Sharif University of Technology

SPEECH SIGNAL SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING RASTA (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology

RASTA-PLP • Algorithm Computer Engineering Department Sharif University of Technology

RCC-Mean Normalization • Root Cepstral Coefficients (RCC) • Derived using root compression rather than log compression on the filterbank energies • Advantage of RCC to MFCC • More immune to noise • Faster decoding Computer Engineering Department Sharif University of Technology

RCC-Mean Normalization • Mean normalization • If we approximate root with logarithm Computer Engineering Department Sharif University of Technology

Experiment 1 • Database • TFARSDAT • 64 Speakers • 8 hours telephony speech data • ASR • Sharif ASR System • HMM based • Training: Segmental K-means • Search: Beam Viterbi Computer Engineering Department Sharif University of Technology

Test results Experiment 1 Computer Engineering Department Sharif University of Technology

Experiment 2 • Aurora 2.0 • Noisy connected digits recognition • 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions • HTK • HMM based • Model for each digit • 16 states with 3 Gaussian mixtures Computer Engineering Department Sharif University of Technology

Experiment 2 • Average results on AURORA • Average obtained on various SNRs of a noise Computer Engineering Department Sharif University of Technology

Experiment 2 • Subway noise in various SNRs Computer Engineering Department Sharif University of Technology

Experiment 2 • Babble noise in various SNRs Computer Engineering Department Sharif University of Technology

Experiment 2 • Car noise in various SNRs Computer Engineering Department Sharif University of Technology

Experiment 2 • Exhibition noise in various SNRs Computer Engineering Department Sharif University of Technology

Summery • Various robust features was tested • Introduce of RCC_MN • In first experiment • RASTA-PLP • Although RCC_MN is good • In second experiment • RCC_MN Computer Engineering Department Sharif University of Technology

Thanks for your patience !

Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

Presentation Transcript

Sharif University of Technology Civil Engineering Department Tehran-Iran

February 2005

MPI Programming Hamid Reza Tajozzakerin Sharif University of technology

February 2005

Author: Hossein Bozorgian Queensland University of Technology (QUT)

Hatim Sharif University of Texas San Antonio

Hossein Sameti Department of Computer Engineering Sharif University of Technology

Sharif University of Technology Department of Computer Engineering

Spring 2007 Computer Engineering Department Sharif University of Technology

Department of Computer Engineering Sharif University of Technology Tehran, IRAN

Mohammad Abam University of Aarhus

Amin Fazel 2006

February 2005

Neda Sadooghi Department of Physics Sharif University of Technology Tehran-Iran

Neda Sadooghi Department of Physics Sharif University of Technology Tehran-Iran

Ali Izadi Rad Sharif University of Technology, Tehran, IRAN

TU Chemnitz - January 2019 Mohammad Hossein Tarokh - tuc.tarokh@gmail

February 2005