240 likes | 848 Views
Gammachirp Auditory Filter. Alex Park May 7 th , 2003. Project Overview. Goal: Investigate use of (non-linear) auditory filters for speech analysis Background: Sound analysis in auditory periphery similar to wavelet transform Comparison: Traditional Short-Time Fourier analysis
E N D
Gammachirp Auditory Filter Alex Park May 7th, 2003
Project Overview • Goal: • Investigate use of (non-linear) auditory filters for speech analysis • Background: • Sound analysis in auditory periphery similar to wavelet transform • Comparison: • Traditional Short-Time Fourier analysis • Gammatone wavelet based analysis (auditory filter) • Extension: • Gammachirp filter has level-dependent parameters which can model non-linear characteristics of auditory periphery • Implementation: • Specifics of Gammachirp implementation • How to incorporate level dependency
Auditory Physiology • Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea • Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency Outer ear Auditory Nerve Cochlea Middle ear Low freq (200 Hz) Cortex High freq (20 kHz) Basilar Membrane
Motivation – Why better auditory models? • Automatic Speech Recognition (ASR) • ASR systems perform adequately in ‘clean’ conditions • Robustness is a major problem; degradation in low SNR conditions is much worse than humans • Hearing research • Build better hearing aids and cochlear implants • Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments • Current hearing aids perform linear amplification, amplify noise as well as the signal • Is the lack of compressive non-linearity in the front-end a common link?
/t/ /ae/ /s/ tone transient noise Non-stationary Nature of Speech • Why is speech a good candidate for local frequency analysis? Waveform of the word “tapestry”
FFT Power Time-Frequency Representation • The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT)
Spectrogram from STFT “tapestry”
Freq (Hz) STFT Characteristics • We can think of the STFT as filtering using the following basis • In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters
Auditory Filterbanks • Unlike the STFT, physiological data indicates that auditory filters: • are spaced more closely at lower freq than at high freq • have narrower bandwidths at lower frequencies (constant-Q) • The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform. • The mother wavelet, or kernel function, is Tone carrier Gamma Envelope
Freq (Hz) Gammatone Characteristics • Unlike the STFT, the Gammatone filterbank uses the following basis • The corresponding frequency responses are
What are we missing? • The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies • Also, Gamma envelope guarantees compact support • But, the filters are 1) symmetric and 2) linear • Psychophysical experiments indicate that auditory filter shapes are: 1) Asymmetric • Sharper drop-off on high frequency side 2) Non-linear • Filter shape and gain change depending on input level • Compressive non-linearity of the cochlea • Important for hearing in noise and for dynamic range
Gammachirp Characteristics • The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel Chirp term Gamma Envelope Tone carrier • Frequency response is asymmetric, can fit passive filter • Level-dependent parameters can fit changes due to stimulus
Implementation • Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter • To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters
Comparison: Tone vs. Passive Chirp outputs • Gammatone output seems to have better frequency res. • Passive Gammachirp output seems to have better time res.
As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels We can incorporate parameter control via feedback Compute Passive GC Spectrogram Segment into frames For each time frame S1 S2 : SN-1 SN Get stimulus level/channel Filter w/ level specific filter Reconstruct Frames Incorporating level dependency
Sample outputs 30dB SNR Clean 40dB SNR 20dB SNR
References • Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/ • Irino, T. and Patterson, R.D. (2001).“A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022. • Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London). • Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35. • Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/
Sidenote Clean 40 dB SNR 30 dB SNR