1 / 39

Using Fo and vocal-tract length to attend to one of two talkers.

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex. With thanks to : Rob Hukin John Culling John Bird MRC & EPSRC. Review past work on the way that the human auditory system uses differences in Fo to separate two voices;

evac
Download Presentation

Using Fo and vocal-tract length to attend to one of two talkers.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex • With thanks to : • Rob Hukin • John Culling • John Bird • MRC & EPSRC

  2. Review past work on the way that the human auditory system uses differences in Fo to separate two voices; Present new data on the use of Fo, vocal-tract length and their combination to allow listeners to select one of two simultaneous messages. Something old, something new, something borrowed, background blue.

  3. Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.

  4. Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.

  5. Broadbent & Ladefoged (1957) • PAT-generated sentence “What did you say before that?” F1 F2 • whenFo the same -125 Hz(either natural or monotone), • listeners heard: • one voice only 16/18 • in one place 18/18 • whenFo different -125 /135 (monotone), • listeners heard: • two voices 15/18 • in two places 12/18

  6. B & L Conclusion Common Fo integrates • broadband frequency regions of a single voice • coming simultaneously to different ears into a single voice heard in one position.

  7. Is a common Fo sufficient for fusion? • Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. • Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.

  8. Formant T(f) & abs difference

  9. apologies to Hideki Dichotic : same Fo PSOLA Fo -> 0% LP filter Left ear original PSOLA Fo -> 0% HP filter Right ear

  10. Dichotic : different Fo PSOLA Fo -> - 4% LP filter Left ear original PSOLA Fo -> + 4% HP filter Right ear

  11. Complementary LP/HP filters Variable bandwidth

  12. Dichotic Results (female voice) Filter X-over @ 1 kHz

  13. Higher filter cut-offs need wider bandwidths Same Fo

  14. Low-frequency overlap cf natural ILDs higher for low frequency sounds

  15. Summary But what about Fo’s ability to separate different voices? (original B & L question)

  16. Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.

  17. double vowels sentences DFo improves identification • double vowels over by 1 semitone • sentences improve for longer

  18. Mechanisms of DFo improvement • A. Global: Across formant grouping by Fo (as originally conceived by B & L) • B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?

  19. DFo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982) • Two sentences (same talker) • only voiced consonants • (with very few stops) Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Target sentence Fo = 140 Hz Task: write down target sentence Replicates & extends Brokx & Nooteboom

  20. Chimeric sentences(Bird & Darwin, Grantham Meeting 1998) 100-100 100-106 100-112 100-133 100-178 Fo below 800 Hz Fo above 800 Hz

  21. Paired sentences' Fos Low Pass High Pass Normal 100 100 112 112 Same Fo in High 100 100 112 100 Same Fo in Low 100 100 100 112 Swapped 100 112(gives wrong gping) 112 100

  22. Segregating sentence pairs by Fo • all the action is in the low frequency region (<800 Hz) • no strong evidence of across-formant grouping

  23. Adding Fo-swapped • inappropriate pairing of Fo only detrimental above 4 semitones

  24. Summary of Fo-differences • Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) • Most of the improvement with small Fo differences happens in the F1 frequency-region.

  25. another caveat for auto-correlation • Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos(Culling & Darwin) • Autocorrelation would pull out completely wrong envelopes.

  26. No simultaneous effect of FM • Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. • Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).

  27. Summary of DFo effects in separating competingvoices • Intelligibility increased by small DFo only in F1 region (and harmonic alternation tolerated)... • … but not by DFo in only higher freq. region. • Across-formant consistency of Fo only important at largerDFo • FM produces no additional separation

  28. Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.

  29. CRM task (tracking a sound source) (Bolia et al., 2000) • 2 simultaneous sentences each of form • Ready (Call Sign) go to (Color) (Number) now. • Same talker (TT); Same Sex (TS); Different sex (TD) • Target denoted by Call-Sign "Baron" • 8 Talkers in corpus, 2048 tokens

  30. CRM task (Bolia et al., 2000) Listeners responded by selecting the appropriate colored digit with the computer mouse

  31. CRM task results (Brungart et al)

  32. Effect of change in Fo

  33. Effect of change in Fo

  34. Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo

  35. Effect of change of VT

  36. Effect of joint change of Fo and VT Original: male

  37. Effect of joint change of Fo and VT Original: female

  38. 1.50 1.00 actual d' 0.50 male female 0.00 0.00 0.50 1.00 1.50 predicted d' Superadditivity of ∆Fo and ∆VT ∆Fo & ∆VT superadditive … and still less than real different-sex talkers

  39. Conclusions • Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. • Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. • Fo & VT-size useful for tracking sources across time. Superadditive.

More Related