Gammachirp Auditory Filter

Gammachirp Auditory Filter Alex Park May 7th, 2003

Project Overview • Goal: • Investigate use of (non-linear) auditory filters for speech analysis • Background: • Sound analysis in auditory periphery similar to wavelet transform • Comparison: • Traditional Short-Time Fourier analysis • Gammatone wavelet based analysis (auditory filter) • Extension: • Gammachirp filter has level-dependent parameters which can model non-linear characteristics of auditory periphery • Implementation: • Specifics of Gammachirp implementation • How to incorporate level dependency

Auditory Physiology • Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea • Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency Outer ear Auditory Nerve Cochlea Middle ear Low freq (200 Hz) Cortex High freq (20 kHz) Basilar Membrane

Motivation – Why better auditory models? • Automatic Speech Recognition (ASR) • ASR systems perform adequately in ‘clean’ conditions • Robustness is a major problem; degradation in low SNR conditions is much worse than humans • Hearing research • Build better hearing aids and cochlear implants • Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments • Current hearing aids perform linear amplification, amplify noise as well as the signal • Is the lack of compressive non-linearity in the front-end a common link?

/t/ /ae/ /s/ tone transient noise Non-stationary Nature of Speech • Why is speech a good candidate for local frequency analysis? Waveform of the word “tapestry”

FFT Power Time-Frequency Representation • The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)

Spectrogram from STFT “tapestry”

Freq (Hz) STFT Characteristics • We can think of the STFT as filtering using the following basis • In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters

Auditory Filterbanks • Unlike the STFT, physiological data indicates that auditory filters: • are spaced more closely at lower freq than at high freq • have narrower bandwidths at lower frequencies (constant-Q) • The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform. • The mother wavelet, or kernel function, is Tone carrier Gamma Envelope

Freq (Hz) Gammatone Characteristics • Unlike the STFT, the Gammatone filterbank uses the following basis • The corresponding frequency responses are

What are we missing? • The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies • Also, Gamma envelope guarantees compact support • But, the filters are 1) symmetric and 2) linear • Psychophysical experiments indicate that auditory filter shapes are: 1) Asymmetric • Sharper drop-off on high frequency side 2) Non-linear • Filter shape and gain change depending on input level • Compressive non-linearity of the cochlea • Important for hearing in noise and for dynamic range

Gammachirp Characteristics • The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel Chirp term Gamma Envelope Tone carrier • Frequency response is asymmetric, can fit passive filter • Level-dependent parameters can fit changes due to stimulus

Implementation • Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter • To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters

Comparison: Tone vs. Passive Chirp outputs • Gammatone output seems to have better frequency res. • Passive Gammachirp output seems to have better time res.

Comparison: Tone vs. Active Chirp Outputs

As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels We can incorporate parameter control via feedback Compute Passive GC Spectrogram Segment into frames For each time frame S1 S2 : SN-1 SN Get stimulus level/channel Filter w/ level specific filter Reconstruct Frames Incorporating level dependency

Sample outputs 30dB SNR Clean 40dB SNR 20dB SNR

References • Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/ • Irino, T. and Patterson, R.D. (2001).“A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022. • Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London). • Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35. • Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/

Sidenote Clean 40 dB SNR 30 dB SNR

Gammachirp Auditory Filter

Gammachirp Auditory Filter

Presentation Transcript

Auditory System

Auditory Transduction

Auditory Perception

Auditory System

Auditory Skills

Auditory System

Auditory II Central Auditory System

Auditory 2

Auditory I

Auditory I – Peripheral Auditory System

Auditory - Hearing

Auditory Brainstem

The Peripheral Auditory System – Auditory I

AUDITORY LEARNERS

Auditory Forebrain

Auditory Midbrain

Auditory System

AUDITORY LOCALIZATION

auditory

Auditory Perception

AUDITORY ATTENTION

Auditory Computation