380 likes | 581 Views
Interrupted speech perception. Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota. Introduction. In everyday life, speech is not equally intelligible over time due to background noise
E N D
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota
Introduction • In everyday life, speech is not equally intelligible over time due to background noise • quite fluctuating rather than steady in terms of time, amplitude and frequency • Normal hearing (NH) listeners can take advantage of fluctuating nature of noise • Masking release: improvements in speech recognition in fluctuating noise compared to performance in steady noise. • Listener’s ability to utilize the momentary decrease (dips) in fluctuating noise to resolve the background fluctuations in order to extract speech information (Dubno et al., 2002; Nelson et al. 2003; Jin & Nelson, 2006)
Speech and Noise (NH) Quiet speech Speech in noise Speech in fluctuating noise
Introduction • Effect of noise on HI speech perception • only little or no release from masking in fluctuating noise • Even in listeners with mild hearing loss ( Bacon et al., 1998; Dubno et al. 2002) • speech recognition in steady noise was close to normal • SOME significantly worse than normal in fluctuating noise • Speech perception in fluctuating noise may provide a more sensitive measure of impairment due to hearing loss
Jin & Nelson (2006) • Investigated the relationship between the amount of masking release (MR) and hearing sensitivity and temporal resolution in NH and HI listeners • Sentence and consonant recognition in quiet, steady and fluctuating noise • Hearing sensitivity and forward masking
Jin & Nelson (06)Amplification and shaping • Two-stage process to amplify speech and noise for HI listeners • Shaping applied based on half-gain rule to compensate for hearing loss configuration • Overall amplification added to bring listeners to maximum sentence recognition (90% or better) in quiet • Process applied to speech and noise for each HI listener individually
Jin & Nelson (06) • Role of hearing sensitivity • Listeners with sensorineural hearing loss showed reduced hearing sensitivity • The effect of noise on HI listeners becomes more detrimental than it is to NH listeners because they already have reduced redundancy of the speech signal in quiet (Van Tasell, 1993). fluct • Role of temporal resolution • Compared to NH, HI listeners are more affected by non-simultaneous maskers • Strong correlation between masking release and forward masking threshold (Dubno et al., 2002) • Observed that syllable recognition in fluctuating noise might be associated with age-related increases in forward-masked thresholds
Result 1: sentence recognitionPercent correct keyword identification at -5 dB SNR
Result 1: Speech recognition for both sentence and syllable identification • In Quiet and Steady noise • There were no significant NH and HI group differences (p > 0.177). • In Gated noise • Significant improvements in gated noise (over steady) were seen for both NH and HI listeners. • NH listeners could take advantage of gating about to regain 80% of their quiet performance whereas HI listeners only gained 15% to 55%.
Result 1b: Percent correct syllable identification and MR at -5 dB SNR
Result 2:FWD Masked Threshold • Masked thresholds for HI listeners were higher than those for NH listeners • Those with lower thresholds in quiet showed better thresholds in the presence of noise. • The slopes of recovery function were shallower for HI than NH listeners • Those HI listeners with close to normal recovery function showed relatively larger amount of masking release in fluctuating noise cv?
Discussion • Relation between MR and other measures • High negative correlation (r-0.8) between the amount of MR and the results of forward masking for both NH and HI groups • When the results of HI listeners were analyzed, only a few factors retained a strong correlation with MR. • Hearing thresholds at 0.5 and 1.0 kHz, and FWD thresholds at 2 kHz • The MR for sentence recognition and CV syllable recognition were analyzed separately. • ??
Result : regression analyses • The strength of the relationship between MR and the predictors: • For sentence recognition in gated noise, hearing sensitivity at low-to-mid frequencies(0.5 and 1 k Hz) accounted for a substantial proportion of variance in the MR. • For consonant recognition, forward masked thresholds contributed primarily to the variancein the MR.
Discussion • HI listeners who performed close to normal in quiet and in steady noise still show reduced masking release for both sentence and consonant recognition in gated noise • The overall pattern of masking release measured using sentences and using CV syllables was similar.
Discussion • Several factors seemed to contribute to the MR • Hearing sensitivity for low-to-mid frequencies (0.5 kHz and 1.0 kHz) were strongly related to the MR for sentence recognition but not for syllable identification. • Forward masking thresholds were more strongly related to the MR for CV syllable identification. • Consistent with Dubno et al. (2002)
Follow-up • Continued to investigate additional factors that might contribute the reduced sentence recognition in fluctuating noise for HI listeners • Auditory integration • Frequency resolution
Follow-up • Role of spectral resolution • Hearing impairment often associated with reduced frequency selectivity. • Reduced auditory representation of spectral peaks and valleys in speech (Miller et al., 1997) • Cochlear implant listeners • Had little MR for sentence recognition in fluctuating noise (Nelson et al., 2003). • Are known to have normal like temporal resolution (Nelson & Donaldson, 2001) but limited spectral resolution. • Reduced MR may be related to broader auditory filters
Follow-up • Role of auditory integration • Understanding speech in real life requires a listener to analyze complex sounds and separate the acoustic characteristics of the input signal from background noise. • This process has been known as auditory stream segregation (Bregman, 1990). • Listeners seem to be able to segregate into streams when sounds are different in spectral cues (Rose & Moore), or in the harmonicity and fundamental frequency (F0) of a signal (Qin & Oxenham, 2003).
Follow-up • Two tasks • Auditory filter characteristics • Auditory integration: interrupted sentence recognition • the same NH and HI listeners who participated in Jin & Nelson (2006) • Analysis • Examine the relationship between the MR from Jin & Nelson (2006) and the auditory filter shape and interrupted speech recognition
Follow-up • Auditory filter characteristics • Used the filter shape equation (Patterson et al., 1982) • Estimate equivalent rectangular bandwidths (ERB) and slope (p) of the auditory filters for 2000 and 4000 Hz • Interrupted IEEE sentence recognition by silence gap • IEEE sentences were gated at rates of 1, 2, 4, 8 and 16 Hz. • There was no noise (no masking). • Depending on the gate frequencies, whole or only parts of words were available to listeners. • Percent correct keyword identification was recorded.
Speech in gated noiseVS interrupted speech silent gaps Fluctuating noise at 8 Hz Interrupted sentence at 8 Hz
Follow-upResult 1: frequency resolution • Compared to NH group, HI listeners showed greater ERBs and shallower slopes for both 2 kHz and 4 kHz. • The average ERB for HI listeners (for both 2 kHz and 4 kHz) were about 2.2-2.7 times to those of NH group.
Follow-upResult 2: interrupted sentence recognition • The average percent correct scores of the NH group at each gate frequency were higher than the scores of HI listeners. • The relationship between sentence recognition in gated noise and interrupted sentence recognition for both NH and HI listeners were significant (r≈ 0.8). • when the scores of HI listeners only were compared, the correlation remained strong (r ≥ 0.8)
Discussion • Several factors seemed to contribute to the MR • Hearing sensitivity for low-to-mid frequencies (0.5 kHz and 1.0 kHz) as well as the auditory filter shape at 2 and 4 kHz were strongly related to the MR for sentence recognition. • Understanding speech interrupted either by noise or silence might require a similar underlying integration process. • Percent correct scores for interrupted sentence recognition and sentence recognition in gated noise were strongly correlated. • The hearing sensitivity for the low-to-mid frequencies and ERBs that were significant predictors for both the MR and interrupted sentence recognition
Follow-up 2:interrupted sentence recognition • The degree of MR in sentence recognition seemed to be correlated with low-mid frequency hearing sensitivity • Gaëtan & Christophe (2002) found older listeners with mild HL put more perceptual weight on mid frequency band (750-1750 Hz) compared to NH listeners. • Qin & Oxenham (2003) suggested that a strong pitch cue is important for a listener to segregate speech from noise. • Low-mid frequency information may be more important for understanding sentence with competing noise for HI listeners • Reducing low-frequency gain to improve comfort in noise may have unwanted consequences for HI listeners
Follow-up 2 • Task:Understanding IEEE sentences • in quiet • in steady noise • in gated noise • gated with silent gaps • Participants • 10 young adult NH listeners
Follow-up 2 • Stimuli • Both speech and noise were either unprocessed (natural) or processed through 4 different filters • Band-pass (bp) filter cutoff frequencies: 0.5-0.75 kHz, 0.75-1 kHz, 1-2 kHz, and 2-3 kHz
Follow-up 2 • Speech • IEEE sentences spoken by ten talkers • presented at comfortable level (70-75 dB SPL) for individual NH listeners • Noise • Long-term spectrum of speech (SNR of -5 dB) • steady or fluctuating • Gate frequency: 8 Hz • the RMS amplitude of noise was adjusted relative to the RMS of the target speech
Results • In quiet, performance was similar across filter conditions (above 90 %) except band-pass filtered at 1 kHz-2 kHz. • In steady noise at -5 SNR, the performance was quite low (below 5 % ). • except when only mid-to-high frequency information was audible (bp 2-3 kHz) • In gated noise, performance was significantly better than in steady noise. • NH listeners were able to use the limited spectral information in the dips of noise to separate speech from noise • except bp 1_2 kHz • poorer performance both in quiet and steady noise compared to other filtering conditions
Percent correct key word identification Masking Release
Key word identification of interrupted sentencesVSMasking Release
Results • the performance in interrupted speech recognition showed similar pattern to the performance in gated noise • If only limited spectral information is available, it results in low performance level in both gated noise and interrupted speech • (for example, bp1_2 kHz). • With more spectral cues available in dips, the performance was better in both conditions.
Discussion • NH listeners were able to use limited spectral information to understand speech in quiet and gated noise. • Using high frequency cues in filtered speech such as bp2_3 kHz, NH listeners were able to segregate speech from gated noise better and showed greater amount of masking release. • the high frequency cues might be more important for speech recognition in complex noise background than the low frequency ones for NH listeners. • it would be less available for HI listeners
Discussion • Similar performance in recognition of speech with gated noise and recognition of gated speech • Understanding interrupted speech is dominated by the information in the dips. (Limited information in the dips = limited masking release) • suggesting that masking release is determined by listeners' ability to decode the information in the dips. (consistent with Kwon and Turner, 2001.)
Future directions??????? • Investigate recognition of narrow band pass filtered speech from HI listeners • Different degrees and configuration of HL • Compare the perceptual weight function of NH and HI listeners by using sentence recognition in different types of noise. • Implications