140 likes | 279 Views
In-car Speech Recognition Using Distributed Microphones. Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University. Background. In-car Speech Recognition using multiple microphones
E N D
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University
Background • In-car Speech Recognition using multiple microphones • Since the position of the speaker and noise are not fixed, many sophisticated algorithms are difficult to apply. • Robust criterion for parameter optimizing is necessary. • Multiple Regression of Log Spectra (MRLS) • Minimize the log spectral distance between the reference speech and the multiple regression results of the signals captured by distributed microphones. • Filter parameter optimization for microphone array (M.L. Seltzer, 2002) • Maximize the likelihood by changing the filter parameters of a microphone array system for a reference utterance.
Sample utterances expressway city area idling
Block diagram of MRLS distant microphones Spectrum Analysis ・・・ Spectrum Analysis ・・・ ・・・ Spectrum Analysis MR ・・・ Speech Recognition MR ・・・ Speech Signal log MFB output MR Regression Weights Approximate log MFB output
N X1 Gi Xi Hi S XN Modified spectral subtraction • Assume that power spectrum at each microphone position obey power sum rule.
Multiple regression of log spectrum Minimum error is given when
Reduction of freedom in optimization Optimal regression weights 1 1 0
Experimental Setup for Evaluation • Recorded with 6 microphones • Training data • Phonetically balanced sentences • 6,000 sentences while idling • 2,000 sentences while driving • 200 speakers • Test data • 50 isolated word utterances • 15 different driving conditions • road (idling/ city area/ expressway) • in-car (normal/ fan-low/ fan-hi/ CD play/ window open) • 18 speakers side view top view distributed microphone positions
Recognition experiments • HMMs: • Close-talking: close-talking microphone speech. • Distant-mic.: nearest distant microphone (mic. #6) speech. • MLLR: nearest distant mic. speech after MLLR adaptation. • MRLS: MRLS results obtained by the optimal regression weights for each training utterance. • Test Utterances • Close-talking speech (CLS-TALK) • Distant-microphone speech (DIST) • Distant-microphone speech after MLLR adaptation (MLLR) • MRLS results of the 6 different weights optimized for: • each utterance (OPT) • each speaker (SPKER) • each driving condition (DR) • all training corpus (ALL)
Performance Comparison(average over 15 different conditions) MRLS
Clustering in-car sound environment • Clustering in-car sound environment using a spectrum feature concatenating distributed microphone signals Clustering Results
Adapting weights to sound environment • Vary regression weights in accordance with the classification results. • Same performance with speaker/condition dependent weights.
Summary • Results • Log spectral multiple regression is effective for in-car speech recognition using distributed multiple microphones. • Especially, when the regression weights are trained for a particular driving condition, very high performance can be obtained. • Adapting weights to the diving condition improves the performance. • Future works • Combing with microphone array.