Precedence-based speech segregation in a virtual auditory environment

Precedence-based speech segregation in a virtual auditory environment Brungart, Simpson & Freyman (2005)

Delay D ms The Precedence Effect • Sounds produced in areas with multiple surfaces give rise to reflections. Many copies of a sound reach a listener’s ears. The direct sound arrives first. • With complex sounds like speech, early reflections tend to perceptually “fuse” with the direct sound (the “Haas” effect). • The direct sound dominates localisation – the precedence effect. • = +/- 0.5 ms > “summing localisation” <- Perceived direction two sources perceived D > 1 ms > “precedence effect” D > 20 ms > “echo threshold”

Masking • “…the amount of interference one stimulus can cause in the perception of another stimulus.” (Yost and Nielsen, 1977) • The elevation in threshold of a target signal due to the presence of a masker. • Energetic masking • “…masking that results from competition between target and masker at the periphery of the auditory system, i.e., overlapping excitation patterns in the cochlea or auditory nerve (AN).” (Durlach et al., 2003) • Informational masking • Non-energetic masking • Central masking • “difficulty segregating the audible acoustic components of the target speech signal from the audible acoustic components of a perceptually similar speech masker.” (pp. 3241).

Some Assumptions • Speech target • Random noise masker = purely energetic masking? • Speech masker = energetic and informational masking? • So if an experimental manipulation affects the amount of masking produced by the speech masker but not the noise masker – this is due to a reduction in informational masking? • Seems reasonable

The Basic Experiment Freyman et al., 99 – free-field. Brungart et al. – virtual auditory space over headphones F-F – Baseline masking F-R – Release from masking regardless of type of masker F-RF – Release from masking with speech but NOT with noise masker

Experiment 1 Adding delayed copy of noise to front presented stimulus drops performance to baseline Adding delayed copy of speech to front hardly makes any difference Note: using a speech recognition task which is resistant to energetic masking - Therefore large informational masking component?

Interpretation The precedence effect causes the listener to localise the RF masker off to the right, which helps auditory selective attention attend to the target speech, hence reducing informational masking. This doesn’t affect the noise masker because it has no informational masking effect – adding it to the front just increases its energetic masking effect. BUT – The effect is also observed when the delay is negative, so that the first copy of the masker comes from the front (i.e. F-FR). (Freyman et al. 1999) Precedence should localise the masker to the front in this condition – so why the release from masking with a speech masker?

F-FR F-RF SNR Baseline - 8dB Baseline 0 dB ? Baseline Experiment 2 • What is the effect of varying the delay between the two masker presentations between +/– 64 ms? • For a noise masker? • Very little. • Some release from masking at delays which cause “notches” in the spectrum of the masker far enough apart to be resolved by the ear • For a single-speaker speech masker? • Little effect of delay, positive or negative, until the “echo threshold” is exceeded • For a two-speaker speech masker? Much more variation, but still substantial release from masking. Possibly some release from energetic masking effects • Note that as speakers are added, multi-speaker babble approaches speech-shaped noise.

A Puzzle • There is virtually no difference between positive and negative delays with the single-speaker masker and not much of an advantage with the two-speaker masker • What is going on here? • Two possibilities (actually 3, but I’ll come back to this): • 1) The effect is not based on perceived location, but on timbre or “ source width” • 2) Even when the copy of the masker added to the front leads the one from the right, the one to the right “pulls” the perceived location off a little so that it is perceived somewhere between front and right • If (2) is the case, then shifting the apparent location of the target to match that of the masker, should abolish the release from masking

Experiment 3 Position of target varied from 0o to 60o In 5o steps, at 7 different delay values from + to – 4ms. • U-shaped performance curves for all 3 maskers at D = 0 ms. Masker heard midway between front and right. • For the two-speaker masker, when there is a lag (+ve D) > 0.5 ms, subjects do best when target is located near the front (0o). As expected • When there is a lead (-ve D) > 0.5 ms, subjects do best when target is located to the right. • BUT – the minimum performance is found around 10o – NOT at 0o

Conclusions • This would appear to support the hypothesis mentioned earlier • BUT – why is there not a similar minima around 50o when there is a positive delay? • Also – energetic and informational masking do not seem to have been completely separated by this paradigm as was first thought • AND – no mention is made of the phenomena of the BMLD: • Whenever the phase or level differences of the target signal at the 2 ears are not the same as those of the masker, ability to detect or identify the target improves • Inversion of the signal at one ear gives better performance than delaying it – so not just segregation by spatial separation • Large BMLD’s occur when target and masker are not subjectively well separated • Hearing is sensitive to the profile of interaural decorrelation across frequency • This could potentially explain why negative delays are as useful as positive delays – adding a delayed copy of the masker at the right changes the interaural correlation of the masker relative to the target • But this still wouldn’t explain the difference between speech and noise…

Precedence-based speech segregation in a virtual auditory environment

Precedence-based speech segregation in a virtual auditory environment

Presentation Transcript

An Auditory Scene Analysis Approach to Speech Segregation

Speech Segregation Based on Sound Localization

An Assessment of a Speech-Based Programming Environment

Capacity Planning in a Virtual Environment

Capacity Planning in a Virtual Environment

Auditory Objects In A Complex Acoustic Environment

Capacity Planning in a Virtual Environment

Non-auditory influences on speech perception

Creating Clusters in a Virtual Environment

Education in virtual environment

Speech Segregation Based on Oscillatory Correlation

Speech Segregation

precedence

Identifying Segregation of Duties Issues in a PeopleSoft Environment

A Speech Interface to Virtual Environment

Realizing the Interactive Speech Interface in a Multi-user Virtual Environment

Auditory Segmentation and Unvoiced Speech Segregation

Virtual Environment

Precedence

VIRTUAL ENVIRONMENT