In-car Speech Recognition Using Distributed Microphones

In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University

Background • In-car Speech Recognition using multiple microphones • Since the position of the speaker and noise are not fixed, many sophisticated algorithms are difficult to apply. • Robust criterion for parameter optimizing is necessary. • Multiple Regression of Log Spectra (MRLS) • Minimize the log spectral distance between the reference speech and the multiple regression results of the signals captured by distributed microphones. • Filter parameter optimization for microphone array (M.L. Seltzer, 2002) • Maximize the likelihood by changing the filter parameters of a microphone array system for a reference utterance.

Sample utterances expressway city area idling

Block diagram of MRLS distant microphones Spectrum Analysis ・・・ Spectrum Analysis ・・・・・・ Spectrum Analysis MR ・・・ Speech Recognition MR ・・・ Speech Signal log MFB output MR Regression Weights Approximate log MFB output

N X1 Gi Xi Hi S XN Modified spectral subtraction • Assume that power spectrum at each microphone position obey power sum rule.

Taylor expansion of log spectrum

Multiple regression of log spectrum Minimum error is given when

Reduction of freedom in optimization Optimal regression weights 1 1 0

Experimental Setup for Evaluation • Recorded with 6 microphones • Training data • Phonetically balanced sentences • 6,000 sentences while idling • 2,000 sentences while driving • 200 speakers • Test data • 50 isolated word utterances • 15 different driving conditions • road (idling/ city area/ expressway) • in-car (normal/ fan-low/ fan-hi/ CD play/ window open) • 18 speakers side view top view distributed microphone positions

Recognition experiments • HMMs: • Close-talking: close-talking microphone speech. • Distant-mic.: nearest distant microphone (mic. #6) speech. • MLLR: nearest distant mic. speech after MLLR adaptation. • MRLS: MRLS results obtained by the optimal regression weights for each training utterance. • Test Utterances • Close-talking speech (CLS-TALK) • Distant-microphone speech (DIST) • Distant-microphone speech after MLLR adaptation (MLLR) • MRLS results of the 6 different weights optimized for: • each utterance (OPT) • each speaker (SPKER) • each driving condition (DR) • all training corpus (ALL)

Performance Comparison(average over 15 different conditions) MRLS

Clustering in-car sound environment • Clustering in-car sound environment using a spectrum feature concatenating distributed microphone signals Clustering Results

Adapting weights to sound environment • Vary regression weights in accordance with the classification results. • Same performance with speaker/condition dependent weights.

Summary • Results • Log spectral multiple regression is effective for in-car speech recognition using distributed multiple microphones. • Especially, when the regression weights are trained for a particular driving condition, very high performance can be obtained. • Adapting weights to the diving condition improves the performance. • Future works • Combing with microphone array.

In-car Speech Recognition Using Distributed Microphones

In-car Speech Recognition Using Distributed Microphones

Presentation Transcript

Speech Recognition

Speech Recognition

ETSI STQ Aurora Distributed Speech Recognition (DSR)

Using Speech Recognition for Speech Therapy

Using Speech Recognition

Speech recognition using HMM

Speech Recognition

Speech recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Blind speech dereverberation using multiple microphones

Speech Recognition

Speech Recognition

ETSI STQ Aurora Distributed Speech Recognition (DSR)

Speech Recognition

Speech Recognition

17.0 Distributed Speech Recognition and Wireless Environment