420 likes | 701 Views
EEL6586 Automatic Speech Processing . Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski. Topics. Anatomy of the Ear and Hearing Auditory perception Hearing aids and Cochlear implants. The Incredible sense of Hearing.
E N D
EEL6586 Automatic Speech Processing Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski
Topics • Anatomy of the Ear and Hearing • Auditory perception • Hearing aids and Cochlear implants.
The Incredible sense of Hearing “Behind these unprepossessing flaps ... lie structures of such delicacy that they shame the most skillful craftsman" Stevens, S.S. [Professor of Psychophysics, Harvard University]
Why study hearing? • Best example of Speech Recognition • Mimic Human Speech Processing • Hearing Aids/ Cochlear implants • Speech Coding
The stapes or stirrup is the smallest bone in our body. It is roughly the size of a grain of rice ~2.5mm • The movement of the eardrum in response to the minimum audible ## sound is less than the diameter of a hydrogen atom • The inner ear has reached its full adult size and shape when the fetus is 20-22 weeks old. • Even during sleep the ear continues to function with incredible efficiency • The ears are responsible for keeping the body in balance • Hearing loss is the number one disability in the world. • Percentage of people who loose their hearing at age 19 and over: 76.3%
Dynamic Range of Hearing • The practical dynamic range could be said to be from the threshold of hearing to the threshold of pain • Sound level measurements in decibels are generally referenced to a standard threshold of hearing at 1000 Hz for the human ear which can be stated in terms of sound intensity: Dynamic range is enhanced by an effective amplification structure which extends its low end and by a protective mechanism which extends the high end.
A N A T O M Y
Pinna /Auricle Outer Ear Auditory Canal • Focuses sound waves (variations in pressure) into the ear canal • Sound spreads out according to Inverse Square Law • A larger pinna captures more of the wave and hence more sound energy. • Elephants: Hear Low frequency sound from up to 5 miles away • Human Pinna structure: Pointed forward & has a number of curves. • More sensitive to sounds in front • Dogs/ Cats- Movable Pinna => focus on sounds from a particular direction
Pinna /Auricle Outer Ear Auditory Canal • Horizontal localization Sound Localization • Vertical localization Is sound on your right or left side? Interaural Time Difference (ITD) Interaural Intensity Difference (IID) Interaural Differences
Interaural differences The direct path from the acoustic source to the two ears will generally be different -The signal needs to travel further to more distant ear -More distant ear partially occluded by the head Two types of interaural difference will emerge - Interaural time difference (ITD) - Interaural intensity difference (IID)
left right Schematic illustration of interaural differences Left ear Right ear time sound onset
Schematic illustration of interaural differences Left ear Right ear time sound onset arrival time difference
ongoing time difference Schematic illustration of interaural differences Left ear Right ear time sound onset
Schematic illustration of interaural differences Left ear intensity difference Right ear time sound onset
Thresholds Interaural time differences (ITDs) • Threshold ITD 10-20 ms (~ 0.7 cm) Interaural intensity differences (IIDs) • Threshold IID 1 dB
D U P L E X T H E O R Y Interaural time differences (ITDs) Low frequencies Ongoing disparities can only be detected for frequencies up to around 1500 Hz sensitivity declines rapidly above 1000 Hz • Auditory system assumes that the smallest phase difference corresponds to the true ITD • For frequencies below 700 Hz, this strategy will always give the correct answer Interaural intensity differences (IIDs) High Frequencies The amount of attenuation varies across frequency • below 500 Hz, IIDs are negligible (due to diffraction) • from 2 – 4 kHz, IIDs of 10 dB occur for sources located at 90º • IIDs can reach up to 20 dB at high frequencies
Pinna /Auricle Outer Ear Auditory Canal • Horizontal localization Sound Localization • Vertical localization Is sound above or below? Pinna Directional Filtering • Pinna amplifies sound above and below differently • Curves in structureselective amplifies certain parts of the sound spectrum
Sound localization of Barn Owls and cats In a Barn Owl, the left ear left opening is higher than the right - so a sound coming from below the Owl's line of site will reach the right ear first. • Hearing sensitivity comparison of Barn Owls, Cats & Humans • Both the cat and the Barn Owl have much more sensitive hearing than the human in the range of about 0.5 to 10 kHz. • The cat and Barn Owl have a similar sensitivity up to approximately 7 kHz. • Beyond this point the Barn Owl's sensitivity declines sharply.
Project1:Using Head-Related Transfer Functions to deliver speech for virtual reality applications • The simplest spatial audio systems are limited to localizing in azimuth only. • To go beyond the limited capabilities of these approaches, we need to use Head-Related Transfer Functions (HRTF's). • The impulse response from the source to the ear drum is called the Head-Related Impulse Response (HRIR), and its Fourier transform H(f) is called the Head Related Transfer Function (HRTF) • It accounts for diffraction around the head, reflections from the shoulders and most significantly, reflections from the pinnae.
Frequency Dependent Frequency Independent Project 2 Beamforming and Direction of Arrival Most DOA algorithms apply Eigen Decomposition for the Spatial correlation matrix and noise subspace eg. MUSIC, ESPRIT More biologically inspired DOA algorithm should do better
Pinna /Auricle Outer Ear Auditory Canal • Auditory canal length 2.7cm • Can model the canal as a ¼ wave resonator • Resonance frequency ~3Khz • Boosts energy between 2-5Khz upto 15dB • Correspondingly, the hearing curves show a significant dip in the range 2000-5000 Hz with a peak sensitivity around 3500 -4000 Hz. • High sensitivity region at 2-5kHz is very important for the understanding of speech.
A N A T O M Y
Eardrum Middle Ear Ossicles Oval window Functions of Inner Ear Impedance matching • Between vibrations in air and the liquid medium in the inner ear. • Acoustic impedance of the fluid is 4000 x that of air. => All but 0.1% would be reflected back. Stapedius reflex (explained later) The tympanic membrane or "eardrum" receives vibrations traveling up the auditory canal and transfers them through the tiny ossicles to the oval window.
Eardrum Middle Ear Ossicles Oval window Eardrum MalleusIncusStapesOval Window Ossicles: 3 bones Malleus (Hammer), Incus (Anvil), Stapes (Stirrup) • An amplification by lever action < 3x • Area amplification 15x • Large area of ear drum ( 55mm2), small area of stapes (3.2 mm2) • Increases effective Force/Unit area. Stapedius Reflex: Protection against low frequency sounds Tenses muscles stiffens vibration of Ossicles reduces sound transmitted (20dB) Reflex is triggered by loud sounds
A N A T O M Y
Semicircular Canals Inner Ear Cochlea • The semicircular canals are the body's balance organs. • Hair cells, in the canals, detect movements of the fluid in the canals caused by angular acceleration • The canals are connected to the auditory nerve.
Semicircular Canals Inner Ear Cochlea The inner ear structure called the cochlea is a snail-shell like structure divided into three fluid-filled parts. Two are canals (Scala tympani and Scala Vestibuli) for the transmission of pressure and in the third is the sensitive organ of Corti, which detects pressure impulses and responds with electrical impulses which travel along the auditory nerve to the brain. This mid-modiolar section shows the coiling of the cochlear duct (1) the scala vestibuli (2) and scala tympani (3).The red arrow is from the oval window, the blue arrow points to the round window. Within the modiolus, the spiral ganglion (4) and auditory nerve fibres (5) are seen.
Semicircular Canals Inner Ear Cochlea • The organ of Corti can be thought of as the body's microphone. • Perception of pitch and perception of loudness is connected with this organ. • It is situated on the basilar membrane in the cochlea duct • It contains inner hair cells and outer hair cells. • There are some 16,000 -20,000 of the hair cells distributed along the basilar membrane. • Vibrations of the oval window causes the cochlear fluid to vibrate. • This causes the Basilar membrane to vibrate thus producing a traveling wave. • This causes the bending of the hair cells which produces generator potentials • If large enough will stimulate the fibers of the auditory nerve to produce action potentials • The outer hair cells amplify vibrations of the basilar membrane
The cochlea works as a frequency analyzer It operates on the incoming sound’s frequencies Frequency Theory Place Theory
Frequency Theory BM vibrates in synchrony with the sound entering the ear, producing action potentials-- in auditory nerve cells -- at the same frequency (e.g., 50 Hz sound -> 50 APs/sec). Limitations: max APs/sec = 200 Hz. Use this theory for Frequencies <100Hz
Place Theory 4mm2 1mm2 • High frequency sounds selectively vibrate the BM of the inner ear near the oval window. • =>Each position along the BM has a characteristic frequency at which it has maximum vibration. • Lower frequencies travel further along the membrane before causing excitation of the membrane. • The place along the basilar membrane where maximum excitation of the hair cells occurs determines the perception of pitch 32-35 mm long At the base, the basilar membrane is stiff and thin (more responsive to high Hz) At the end or “apex”, the basilar membrane is wide and floppy (more responsive to low Hz)
Tuning curves of auditory nerve fibers Tonotopic map on Cochlea: Cells in different spots on the cochlea respond to different frequencies, with high frequencies near the base, and low frequencies near the apex. • Method to verify • Apply 50ms tone bursts every 100ms • Increase sound level until discharge rate increases by 1 spike • Repeat for all frequencies Response curve is a BPF with almost constant Q(=f0/BW)
Auditory Neuron The auditory nerve takes electrical impulses from the cochlea and the semicircular canals Makes connections with both auditory areas of the brain. Auditory Area of Brain Information from both ears goes to both sides of the brain - binaural information is present in all of the major relay stations. ----- Left ear information ___ Right ear information
Auditory Neurons Adaptation • When a stimulus is suddenly applied spike rate of an auditory neuron fiber increases rapidly • If the stimulus remains (a steady tone for eg.) the rate decreases exponentially • Spontaneous rate: neuron firings in the absence of stimulus. • Neuron is more responsive to changes than to steady inputs
Perception of Sound Threshold of hearing • How it is measured • Age effects Equal Loudness curves Bass loss problem Critical bands Frequency Masking Temporal Masking
Threshold of Hearing Hearing area is the area between the Threshold in quiet and the threshold of pain. Note: Shift in threshold of quiet for those who listen to loud music The sound intensity required to be heard is quite different for different frequencies. Threshold of hearing at 1000 Hz is nominally taken to be 0 dB. Marked discrimination against low frequencies so that about 60 dB is required to be heard at 30 Hz. The maximum sensitivity at about 3500 to 4000 Hz is related to the resonance of the auditory canal.
Bekesy Tracking • Used to measure Threshold in quiet or JNL of a test tone • STEPS: • Play a tone • Vary its amplitude till its audible • Then tone’s amplitude is reduced to definitely inaudible and the frequency is slowly changed • Then increase the SPL till you can hear and so on. Whole recording will last atleast 15minutes Change in level at fine steps <2dB else clicks become audible and act as a cue to listener
Threshold in Quiet variation with age • Hearing sensitivity decreases with age especially at High frequencies • Note we also loose the sensitivity at 3.5-4Khz • Presbycusis: hearing loss because of age • Hair cells which process HF are closest to the oval window and are often the first to be damaged.
Equal Loudness Curves Loudness is not simply sound intensity! Subjective term describing the strength of the ear's perception of a sound. Have to include the ear's sensitivity to the particular frequencies contained in the sound as in the equal loudness curves. Sound must be increased in intensity by a factor of ten for the sound to be perceived as twice as loud.
The Bass Loss Problem For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies. For mid-range sounds around 60 phons, the discrimination is not so pronounced For very loud sounds in the neighborhood of 120 phons, the hearing response is more nearly flat. Eg. Rock music Too lowno bass Too hightoo much bass
Ohms law of hearing The sound quality of a complex tone depends ONLY on the amplitudes and NOT relative phases of its harmonics.
Elephants • Sound Production • A a typical male elephant’s rumble is around an average minimum of 12 Hz, a female's rumble around 13 Hz and a calf's around 22 Hz. • Produce sounds ranging over more than 10 octaves, from 5 Hz to over 9,000 Hz • Produce very gentle, soft sounds as well as extremely powerful sounds. (112dB recorded a meter away) • Hearing • Wider tympanic membranes • Longer ear canals (20 cm) • Spacious middle ears. Low frequency detection