AVICAR: Audiovisual Speech Recognition in a Car

AVICAR: Audiovisual Speech Recognition in a Car Mark Hasegawa-Johnson, Thomas Huang, Stephen E. Levinson, Camille Goudeseune, Hank Kaczmarski, Michael McLaughlin, Yoshihisa Shinagawa Bowon Lee, Ming Liu, Laehoon Kim, Ameya Deoras, Sarah Borys, Jonathan Boley, Suketu Kamdar, Danfeng Li

8 Mics, Pre-amps, Wooden Baffle. Best Place= Sunvisor. 4 Cameras, Glare Shields, Adjustable Mounting Best Place= Dashboard AVICAR Recording Hardware System is not permanently installed; mounting requires 10 minutes.

AVICAR Database • 100 Talkers • 4 Microphones, 8 Cameras • 5 noise conditions: Engine idling, 35mph, 35mph with windows open, 55mph, 55mph with windows open • Two types of utterances: • Digits & Phone numbers, for training and testing phone-number recognizers • Phonetically balanced sentences, for training and testing large vocabulary speech recognition • Open-IP public release to 15 institutions, 3 continents

AVICAR Database

Experiments with AVICAR Data • Video • Lip Tracking & Video Feature Extraction • 3D-from-Stereo Video Feature Extraction • Audio • Beamforming & Speech Detection • Noise Modeling & Speech Enhancement • Speech Recognition

Left image Right image Transformed right image computed from left image Video: 3D-from-Stereo • Point correspondences computed using dense stereo matching • Correspondence around the lips is good most of the time. • Occasional large errors caused by big differences of brightness and background.

Speech Enhancement: MVDR Beamformer + MMSE-logSA • Goal: MMSE estimate of clean speech cepstrum given multichannel noisy measurement • Solution: • Beamformer based on explicit models of (1) inter-microphone noise coherence and (2) auto interior frequency response... • Followed by a single-channel MMSE log spectral amplitude estimator

MVDR+MMSE-logSA MVDR eliminates high-frequency noise, MMSE-logSA eliminates low-frequency noise MMSE-logSA adds reverberation at low frequencies; reverberation seems to not effect speech recognition accuracy

Speech Recognition Accuracy

Summary of Results • 100-talker audiovisual speech database recorded in moving automobiles • Stereo video features for visual speech recognition • MMSE multichannel estimate of cepstral features • Recognition using factorial HMM. Preliminary results: 57% WER reduction

Multimodal Speech Recognition in Noise: Factorial HMM • Chain q(t) models speech audio • Chain r(t) models noise audio • Third chain (not shown) will model speech video; speech video & audio synchronized via joint transition probabilities (Chu & Huang, 2002)

Excerpt and Speaker Sequence HMM Performance FHMM Performance Movement Digits Trials WER Ave. SNR Ave. SNR WER Allegro Assai Random 0 - 6 35 83% -0.39 35% -0.65 Sequential Allegro Assai 0 - 6 35 77% 0.55 34% 0.55 Andante Random 3 - 8 35 70% 4.34 27% 4.77 Sequential Andante 1 - 8 72 70% 10.43 34% 10.43 Factorial HMM has 57% lower Word Error Rate (WER) than HMM Factorial HMM Preliminary Test: Recognize Speech in Music Factorial HMM has 57% lower Word Error Rate (WER) than HMM

AVICAR: Audiovisual Speech Recognition in a Car

AVICAR: Audiovisual Speech Recognition in a Car

Presentation Transcript

Speech recognition, understanding and conversational interfaces

74.406 Natural Language Processing - Speech Processing -

Use of Sound in Games

Biologically Inspired Noise-Robust Speech Recognition for Both Man and Machine

Technical Seminar presentation on Speech Recognition using DWT

Speech Recognition

FLST: Speech Recognition

A Recognition Model for Speech Coding

Speech Recognition

Conditional Random Fields for Automatic Speech Recognition

Audiovisual speech perception in L1 and L2: an fMRI study

Building an ASR using HTK CS4706

Speech Recognition

AVICAR Progress Since April 2006

speech in, speech out

ISSUES IN SPEECH RECOGNITION Shraddha Sharma

Speech Recognition

MM9 Speech Communication

Chapter 7 Speech Recognition Framework

Real-Time Speech Recognition

Application of Speech Recognition, Synthesis, Dialog

In-car Speech Recognition Using Distributed Microphones