320 likes | 476 Views
HIWIRE MEETING Torino, March 9-10, 2006. José C. Segura, Javier Ramírez. Schedule. HIWIRE database evaluations New results: HEQ and PEQ Non-linear feature normalization Using temporal redundancy HEQ integration in Loquendo platform Recursive estimation of the equalization function
E N D
HIWIRE MEETINGTorino, March 9-10, 2006 José C. Segura, Javier Ramírez
Schedule • HIWIRE database evaluations • New results: HEQ and PEQ • Non-linear feature normalization • Using temporal redundancy • HEQ integration in Loquendo platform • Recursive estimation of the equalization function • New improvements in robust VAD • Bispectrum-based VAD • SVM-enabled VAD
Schedule • HIWIRE database evaluations • New results: HEQ and PEQ • Non-linear feature normalization • Using temporal redundancy • HEQ integration in Loquendo platform • Recursive estimation of the equalization function • New improvements in robust VAD • Bispectrum-based VAD • SVM-enabled VAD
AURORA4 AURORA2 (clean test) Temporal redundancy in HEQ • Enhance the normalization adding a linear transformation to restore temporal correlations • Each equalized cepstral coefficient is time-filtered with an ARMA filter that restores the autocorrelation of clean data
Actually implemented HIGH MISMATCH SEGMENTAL New proposal SENTENCE-BY-SENTENCE RECURSIVE HEQ integration in Loquendo platform
HEQ integration (recursive estimation) (1) • Actual approach: Gaussian HEQ using ECDF • Using quantiles
HEQ integration (recursive estimation) (2) • Equalization by linear interpolation Averaged over training data From actual utterance • Mapping correspondingquantiles
HEQ integration (recursive estimation) (4) • Utterances are equalized WITHOUT delay • Quantiles are updated AFTER the equalization
HIWIRE MEETINGTorino, March 9-10, 2006 José C. Segura, Javier Ramírez
Schedule • HIWIRE database evaluations • New results: HEQ and PEQ • Non-linear feature normalization • Using temporal redundancy • HEQ integration in Loquendo platform • Recursive estimation of the equalization function • New improvements in robust VAD • Bispectrum-based VAD • SVM-enabled VAD
Bispectrum-based VAD (1) • Motivations: • Ability of HOS methods to detect signals in noise • Knowledge of the input processes (Gaussian) • Issues to be addressed: • Computationally expensive • Variance of bispectrum estimators much higher than that of power spectral estimators (identical data record size) • Solution: Integrated bispectrum • J. K. Tugnait, “Detection of non-Gaussian signals using integrated polyspectrum,” IEEE Trans. on Signal Processing, vol. 42, no. 11, pp. 3137–3149, 1994.
Bispectrum-based VAD (2) • Definitions: Let x(t) be a discrete-time signal • Bispectrum: • Third order cumulants: • Inverse transform:
Bispectrum-based VAD (3) Noise only Noise + speech
Bispectrum-based VAD (4) • Integrated bispectrum (IBI): • Cross-spectrum Syx() • Bispectrum Inverse transform: • Bispectrum – Cross spectrum: i= 0
Bispectrum-based VAD (5) • Integrated bispectrum (IBI): • Defined as a cross spectrum between the signal and its square, and therefore, it is a function of a single frequency variable • Benefits: • Less computational cost • computed as a cross spectrum • Variance of the same order of the power spectrum estimator • Properties • For Gaussian processes: • Bispectrum is zero • Integrated bispectrum is zero as well
Bispectrum-based VAD (6) • Two alternatives explored for formulating the decision rule: • Estimation by block averaging (BA): • MO-LRT: • Given a set of N= 2m+1 consecutive observations:
LRT evaluation IBI Gaussian Model Bispectrum-based VAD (7) • Variances • Defined in terms of • Sss (clean speech power spectrum) • Snn (noise power spectrum)
Bispectrum-based VAD (8) • Denoising: 2nd WF stage 1st WF stage 2nd WF design Smoothed spectral subtraction 1st WF design 1-frame delay
Bispectrum VAD Analysis (1) • MO-LRT VAD
Bispectrum-based VAD results (4) WF: Wiener filtering FD : Frame-dropping
SVM-enabled VAD (1) • Motivation: • Ability of SVMs for learning from experimental data • SVMs enable defining a function: using training data: • Classify unseen examples (x, y) • Statistical learning theory restricts the class of functions the learning machine can implement.
SVM-enabled VAD (2) • Hyperplane classifiers: • Training: w and b define maximal margin hyperplane • Kernels:
SVM-enabled VAD (4) • Feature extraction: • Training:
SVM-enabled VAD (5) • Feature extraction: • Decision function • 2-band features
SVM-enabled VAD (6) • Analysis: • 4 subbands • Noise reduction • Improvements: • Contextual speech features • Better results without noise reduction
Dissemination (VAD) • Integrated bispectrum: • J.M. Górriz, J. Ramírez, C. G. Puntonet, J.C. Segura, “Generalized-LRT based voice activity detector”,IEEE Signal Processing Letters, 2006. • J. Ramírez , J.M. Górriz, J. C. Segura, C. G. Puntonet, A. Rubio, “Speech/Non-speech Discrimination based on Contextual Information Integrated Bispectrum LRT”,IEEE Signal Processing Letters, 2006. • J.M. Górriz, J. Ramírez, J. C. Segura, C. G. Puntonet, L. García, “Effective Speech/Pause Discrimination Using an Integrated Bispectrum Likelihood Ratio Test” , ICASSP 2006. • SVM VAD: • J. Ramírez, P. Yélamos, J.M. Górriz, J.C. Segura. “SVM-based Speech Endpoint Detection Using Contextual Speech Features”,IEE Electronics Letters 2006. • J. Ramírez, P. Yélamos, J.M. Górriz, C.G. Puntonet, J.C. Segura. “SVM-enabled Voice Activity Detection”,ISNN 2006. • P. Yélamos, J. Ramírez, J.M. Górriz, C.G. Puntonet, J.C. Segura, “Speech Event Detection Using Support Vector Machines”,ICCS 2006.
HIWIRE MEETINGAthens, November 3-4, 2005 José C. Segura, Javier Ramírez