1 / 26

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR. Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005. Outline. Introduction Physiological basis in the human auditory system Modeling of the basilar membrane and hair cells Experimental results

josef
Download Presentation

A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005

  2. Outline • Introduction • Physiological basis in the human auditory system • Modeling of the basilar membrane and hair cells • Experimental results • Summary and conclusions Department of Computer Engineering

  3. Introduction • Speech is #1 real-time communication medium among humans. • Advantages of voice interface to machines: • Hands-free operation • Speed • Ease of use Department of Computer Engineering

  4. Introduction • Human is a high-performance existence proof for speech recognition in noisy environments. Wall Street Journal/Broadcast news readings, 5000 words Untrained human listeners vs. Cambridge HTK LVCSR system Department of Computer Engineering

  5. Physiological Basis Department of Computer Engineering

  6. Physiological Basis • The semicircular canals are the body's balance organs. • Hair cells, in the canals, detect movements of the fluid in the canals caused by angular acceleration • The canals are connected to the auditory nerve. Semicircular Canals Inner Ear Cochlea Department of Computer Engineering

  7. Physiological Basis • The inner ear structure called the cochlea is a snail-shell like structure divided into three fluid-filled parts. • Two are canals (Scala tympani and Scala Vestibuli) for the transmission of pressure and in the third is the sensitive organ of Corti, which detects pressure impulses and responds with electrical impulses which travel along the auditory nerve to the brain Semicircular Canals Inner Ear Cochlea Department of Computer Engineering

  8. Physiological Basis • The organ of Corti can be thought of as the body's microphone. • Perception of pitch and perception of loudness is connected with this organ. • It is situated on the basilar membrane in the cochlea duct • It contains inner hair cells and outer hair cells. • There are some 16,000 -20,000 of the hair cells distributed along the basilar membrane. • Vibrations of the oval window causes the cochlear fluid to vibrate. • This causes the Basilar membrane to vibrate thus producing a traveling wave. • This causes the bending of the hair cells which produces generator potentials • If large enough will stimulate the fibers of the auditory nerve to produce action potentials • The outer hair cells amplify vibrations of the basilar membrane Semicircular Canals Inner Ear Cochlea Department of Computer Engineering

  9. Modeling of BM and Hair Cells • Different parts of basilar membrane and hair cells are sensitive to different frequencies of input signal. Department of Computer Engineering

  10. Modeling of BM and Hair Cells • Since corporation of basilar membrane and hair cells changes all frequencies of speech into mechanical energy, with good approximation, we can discretely represent basilar membrane and hair cells as forced damped oscillators with different natural frequencies. Department of Computer Engineering

  11. Modeling of BM and Hair Cells • We stimulate these oscillators with input sound • In this simulation we have an oscillating particle which is always pulled by a force towards the center of oscillation • Displacement of the article from the center of oscillation is shown by x and the inward force is equal to –kx. • k is the constant for each oscillator constant Department of Computer Engineering

  12. Modeling of BM and Hair Cells • Since we have a foreign force (posed by sound), we can no further use those standard equations which assume the energy of system is constant. If we don't consider the effect of friction, the energy of system will not decrease and it becomes instable. So we must add a force in opposite direction of movement. Since the direction of movement is determined by v (velocity), the friction force is –bv • Viewing each diapason as a filter Bandwidth Department of Computer Engineering

  13. Modeling of BM and Hair Cells • We model the state of each oscillator with the pair [x v], where x is the displacement and v is the velocity of particle • Where ∆t is the inverse of sampling frequency Department of Computer Engineering

  14. Modeling of BM and Hair Cells • The particle is imposed by three forces: • The diapason itself pulls the particle by force –kx • The sound imposes a foreign force, say Fexternal • To compute Fexternal from the current sample we use the value of sample itself as the external force • The friction opposes to the movement by force –bv Department of Computer Engineering

  15. Modeling of BM and Hair Cells • Now we can compute a, using the following formula • For using this model in feature extraction • After calculation of the energy for each of these oscillators, we use them as feature vectors in ASR systems Department of Computer Engineering

  16. Experimental results • We transform a speech with our human based model and compare it to spectrum domain of this speech • These two transformations have little differences Department of Computer Engineering

  17. Experimental results • This comparing shows that this human based model can be used impressively in ASR systems. • In addition, this method can be used as an effective and quick signal transformation instead of FFT or wavelet in various tasks. Department of Computer Engineering

  18. ASR Experiments • The feature extraction algorithm proposed for speech recognition were tested on a English digit database • For training we use 1386 digit sequences spoken by 18 speakers • In testing phase we use 200 digit sequences that uttered by speakers out of training database • The testing database split to four groups of 50 sequences and four types of noises added to these groups Department of Computer Engineering

  19. ASR Experiments • Recognition is performed using HTK • 16 emitting states and three mixture continuous HMM model • 3-state silence model • Single state inter-digit pause model • In the reference experiments, MFCC_0_D_A is used • Consists of 13 standard cepstral coefficients including C0 augmented with first and second derivations of them • MFCC features were generated by applying a Hamming window of size 25 ms and overlap 10 ms to the same pre-emphasized 23-channel Mel-scale filterbank. • The cepstral features were obtained from DCT of log-energy over the 23 frequency channels. Department of Computer Engineering

  20. ASR Experiments • Car Noise Department of Computer Engineering

  21. ASR Experiments • Exhibition Noise Department of Computer Engineering

  22. ASR Experiments • Babble Noise Department of Computer Engineering

  23. ASR Experiments • Subway Noise Department of Computer Engineering

  24. ASR Experiments • For all contaminated speech, HEFE shows superior performance for all noise types at most SNR levels. • For babble noise, HEFE demonstrates significantly better performance than MFCC. • For subway noise, improvements by the HEFE are least significant, but still noticeable. Department of Computer Engineering

  25. Summary • In this paper we have introduced a simple model for basilar membrane and hair calls based on physiological basis • We use this model for feature extraction in ASR systems • These features significantly outperform MFCC features at babble noise Department of Computer Engineering

  26. Thank you!

More Related