390 likes | 408 Views
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex. With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC. Review past work on the way that the human auditory system uses differences in Fo to separate two voices;
E N D
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex • With thanks to : • Rob Hukin • John Culling • John Bird • MRC & EPSRC
Review past work on the way that the human auditory system uses differences in Fo to separate two voices; Present new data on the use of Fo, vocal-tract length and their combination to allow listeners to select one of two simultaneous messages. Something old, something new, something borrowed, background blue.
Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.
Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.
Broadbent & Ladefoged (1957) • PAT-generated sentence “What did you say before that?” F1 F2 • whenFo the same -125 Hz(either natural or monotone), • listeners heard: • one voice only 16/18 • in one place 18/18 • whenFo different -125 /135 (monotone), • listeners heard: • two voices 15/18 • in two places 12/18
B & L Conclusion Common Fo integrates • broadband frequency regions of a single voice • coming simultaneously to different ears into a single voice heard in one position.
Is a common Fo sufficient for fusion? • Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. • Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.
apologies to Hideki Dichotic : same Fo PSOLA Fo -> 0% LP filter Left ear original PSOLA Fo -> 0% HP filter Right ear
Dichotic : different Fo PSOLA Fo -> - 4% LP filter Left ear original PSOLA Fo -> + 4% HP filter Right ear
Complementary LP/HP filters Variable bandwidth
Dichotic Results (female voice) Filter X-over @ 1 kHz
Low-frequency overlap cf natural ILDs higher for low frequency sounds
Summary But what about Fo’s ability to separate different voices? (original B & L question)
Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.
double vowels sentences DFo improves identification • double vowels over by 1 semitone • sentences improve for longer
Mechanisms of DFo improvement • A. Global: Across formant grouping by Fo (as originally conceived by B & L) • B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?
DFo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982) • Two sentences (same talker) • only voiced consonants • (with very few stops) Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Target sentence Fo = 140 Hz Task: write down target sentence Replicates & extends Brokx & Nooteboom
Chimeric sentences(Bird & Darwin, Grantham Meeting 1998) 100-100 100-106 100-112 100-133 100-178 Fo below 800 Hz Fo above 800 Hz
Paired sentences' Fos Low Pass High Pass Normal 100 100 112 112 Same Fo in High 100 100 112 100 Same Fo in Low 100 100 100 112 Swapped 100 112(gives wrong gping) 112 100
Segregating sentence pairs by Fo • all the action is in the low frequency region (<800 Hz) • no strong evidence of across-formant grouping
Adding Fo-swapped • inappropriate pairing of Fo only detrimental above 4 semitones
Summary of Fo-differences • Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) • Most of the improvement with small Fo differences happens in the F1 frequency-region.
another caveat for auto-correlation • Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos(Culling & Darwin) • Autocorrelation would pull out completely wrong envelopes.
No simultaneous effect of FM • Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. • Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).
Summary of DFo effects in separating competingvoices • Intelligibility increased by small DFo only in F1 region (and harmonic alternation tolerated)... • … but not by DFo in only higher freq. region. • Across-formant consistency of Fo only important at largerDFo • FM produces no additional separation
Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.
CRM task (tracking a sound source) (Bolia et al., 2000) • 2 simultaneous sentences each of form • Ready (Call Sign) go to (Color) (Number) now. • Same talker (TT); Same Sex (TS); Different sex (TD) • Target denoted by Call-Sign "Baron" • 8 Talkers in corpus, 2048 tokens
CRM task (Bolia et al., 2000) Listeners responded by selecting the appropriate colored digit with the computer mouse
Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo
Effect of joint change of Fo and VT Original: male
Effect of joint change of Fo and VT Original: female
1.50 1.00 actual d' 0.50 male female 0.00 0.00 0.50 1.00 1.50 predicted d' Superadditivity of ∆Fo and ∆VT ∆Fo & ∆VT superadditive … and still less than real different-sex talkers
Conclusions • Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. • Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. • Fo & VT-size useful for tracking sources across time. Superadditive.