Automotive Speech Enhancement of Today: Applications, Challenges and Solutions

Automotive Speech Enhancement of Today: Applications, Challenges and Solutions Tim Haulick Harman/Becker Automotive Systems

Communication channel to vehicle Receive side processing:gain control Communication channel in vehicle Phone In-car communication:feedback suppression,automatic gain adjustment Speechdialogsystem Send side processing:beamforming, echo and noise suppression Communication channel from vehicle Automotive Speech Enhancement Applications

Extended Speech Enhancement Receive side processing:gain control,bandwidth extension, adaptive equalization Mobile Phone Speaker independentspeech knowledge Speaker (in-)dependentspeech knowledge Speechdialogsystem Send side processing:echo & noise suppression, wind-noise suppressionspeech reconstruction,beamforming/postfilter

Bandwidth Extension – Examples Narrowband input Narrowband connection: Bandwidth extension for narrowband speech signals (bandwidth 3.4 …3.8 kHz) – extension of low frequency components and extension of high frequency components up to 5.5 or 8 kHz. Narrowband output Wideband input Wideband connection: • Bandwidth extension for wideband speech signals (bandwidth 7 kHz, e.g. AMR wideband codec G.722.2) – • extension of high frequency components up to 11kHz. Wideband output

Bandwidth Extension - Enhancements • Adaptation to different transmission channels (variable bandwidth, different signal-to-noise ratios) • Adaptation to different sound amplifiers • Adaptation depending on the background noise in the vehicle

Speech Reconstruction Motivation: At medium and high driving speed low frequency speech components are often masked by the background noise. However, standard noise suppression methods of today often fail in these situations. As a result the processed speech signal sounds thin and distorted. For further improvementof the speech quality a reconstruction approach is an alternative. However, speech reconstruction starts where conventional noise reduction fails …

Receive side processing Phone(downlink) Loud-speaker Analysisfilter bank Speaker (in-)dependent speech knowledge Echocancellation Speech reconstruction Micro-phone Phone(uplink) Residual echoand noise suppression Mixer Analysisfilter bank Synthesisfilter bank Speech Reconstruction – Algorithmic Overview

Speech Reconstruction – Audio Examples Recon-structed Conven-tional Recon-structed Microphonesignal Conven-tional Frequency in Hz Frequency in Hz 120 km/h Time in seconds Time in seconds Recon-structed Microphonesignal Conven-tional Recon-structed Conven-tional Frequency in Hz Frequency in Hz 160 km/h Time in seconds Time in seconds After CDMA coding Before CDMA coding

A subjective evaluation was performed by 10 trained listeners. The signals have been coded and decoded with the CDMA enhanced variable rate codec (EVRC) prior to listening. The test set comprised 20 different test scenarios with SNRs ranging from 3 dB to 12 dB. Speech Reconstruction – Evaluation (CDMA)

Beamformer Postfilter GSC Micro-phonespectra Outputspectrum Spatialpostfilter Fixed beamf. MAPapproxi-mation Interferencecanceller |…|² Noisepowerestimation Blocking matrix Ratiocomputat. Beamformer/Postfilter • Beamformers perform a directional filtering: signals from the desired speaker direction are passed while signals arriving from other directions are attenuated. • The achievable noise reduction is dependent on the spatio-temporal properties of the soundfield and the number of microphones. • By extending the beamformer with a spatial postfilter a high directionality can even be achieved with a small number of microphones.

Beamformer/Postfilter – Audio Example Audio example: 2-channel microphone array, 120 km/h, driver and passenger are talking Signal of the first microphone 2-channel processing (beamformer/postfilter) © 2008 Harman International Industries, Incorporated. All rights reserved. Page 11

Beamformer/Postfilter – Evaluation Results Reference system: 2-channel adaptive beamformer (GSC) without postfilter Voice Recognition Performance Speech Distortion 7 30 6 25 5 20 4 Log-Spectral Distance relative enhancement of WER [%] 15 3 10 2 5 1 Adaptive Beamformer with Postfilter Adaptive Beamformer 0 0 120 km/h 100 km/h window 1/4 open 120 km/h double-talk 120 km/h 100 km/h window 1/4 open 120 km/h double-talk By extending the adaptive beamformer with a spatial postfilter the voice recognition performance can be improved significantly without impairing the speech quality.

Wind Noise Suppression • Problem: Due to design reasons and lack of space the standard wind shield of hands-free microphones is often insufficient. For this reason the microphone signal is often impaired by wind noise caused by the fan or an open top of a convertible. • Solution: Suppression of wind noise by algorithmic means taking advantage of the statistical properties of the noise 2 channel processed output signal Microphone Signal

In-Car Communication (ICC) Current Situation: • Communication between passengers is difficult, because of the acoustic loss (especially front to back). • Front passengers have to speak louder than normal – longer conversations will be tiring. • Driver turns around – road safety is reduced. Solution: • Improve the speech quality and intelligibility by means of an intercom system. Application: • Mid and high class automobiles, which are already equipped with the necessary audio and signal processing components. • Minibuses, Vans, etc. (cars more than 2 rows of seats). Passenger compartment -5…-15dB* *Acoustic loss (referred to the ear of the driver)

One-Way System 2-4 microphones 2-4 loudspeakers Two-Way System 4-8 microphones 6-8 loudspeakers ICC System ICC System Configurations

Driving Scenarios 0 km/h beside motorway 130 km/h on motorway Prerecorded speech sentences with different Lombard levels were played back via an artificial mouth. Binaural recordings were made by means of a HEAD acoustics NoiseBook on the seat behind the driver. Subjective Evaluation

0 km/h, vehicle parked close to a motorway: 19.7 % prefer the system to be switched off 29.7 % have no preference 50.6 % prefer an activated system 130 km/h, motorway: 4.3 % prefer the system to be switched off 7.1 % have no preference 88.6 % prefer an activated system Results of the Comparison Mean Opinion Score Test (25 signal pairs for each driving situation / 15 listeners per scenario):

0 km/h, vehicle parked close to a motorway: No significant difference (95.2 % system off versus 95.0 % system on) Due to the automatic gain adjustment the intercom system operates with only very small gain at these noise levels 130 km/h, motorway: Significant improvement of the MRT error rate Nearly 50 % error reduction (85.4 % correct answers increased to 92.2 % correct answers) Results of Modified Rhyme Tests (MRT) (48 utterances were presented to each listener per driving situation):

Automotive Speech Enhancement of Today: Applications, Challenges and Solutions