Using Fo and vocal-tract length to attend to one of two talkers.

Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex • With thanks to : • Rob Hukin • John Culling • John Bird • MRC & EPSRC

Review past work on the way that the human auditory system uses differences in Fo to separate two voices; Present new data on the use of Fo, vocal-tract length and their combination to allow listeners to select one of two simultaneous messages. Something old, something new, something borrowed, background blue.

Three types of experiment: Difference in Fo leads to: • binaural separation of sound sources • increase in intelligibility • ability to track a sound source over time.

Broadbent & Ladefoged (1957) • PAT-generated sentence “What did you say before that?” F1 F2 • whenFo the same -125 Hz(either natural or monotone), • listeners heard: • one voice only 16/18 • in one place 18/18 • whenFo different -125 /135 (monotone), • listeners heard: • two voices 15/18 • in two places 12/18

B & L Conclusion Common Fo integrates • broadband frequency regions of a single voice • coming simultaneously to different ears into a single voice heard in one position.

Is a common Fo sufficient for fusion? • Broadbent & Ladefoged's stimuli used formant resonators with broad low-frequency skirts. • Sharply-filtered sounds sometimes give impression of two sound sources even with common Fo.

Formant T(f) & abs difference

apologies to Hideki Dichotic : same Fo PSOLA Fo -> 0% LP filter Left ear original PSOLA Fo -> 0% HP filter Right ear

Dichotic : different Fo PSOLA Fo -> - 4% LP filter Left ear original PSOLA Fo -> + 4% HP filter Right ear

Complementary LP/HP filters Variable bandwidth

Dichotic Results (female voice) Filter X-over @ 1 kHz

Higher filter cut-offs need wider bandwidths Same Fo

Low-frequency overlap cf natural ILDs higher for low frequency sounds

Summary But what about Fo’s ability to separate different voices? (original B & L question)

double vowels sentences DFo improves identification • double vowels over by 1 semitone • sentences improve for longer

Mechanisms of DFo improvement • A. Global: Across formant grouping by Fo (as originally conceived by B & L) • B. Local: Better definition of individual formants - especially F1 where harmonics resolved At small ∆Fos B more important than A for double vowels (Culling & Darwin, JASA 1993). Also true for sentences?

DFo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982) • Two sentences (same talker) • only voiced consonants • (with very few stops) Masking sentence = 140 Hz ± 0,1,2,5,10 semitones Target sentence Fo = 140 Hz Task: write down target sentence Replicates & extends Brokx & Nooteboom

Chimeric sentences(Bird & Darwin, Grantham Meeting 1998) 100-100 100-106 100-112 100-133 100-178 Fo below 800 Hz Fo above 800 Hz

Paired sentences' Fos Low Pass High Pass Normal 100 100 112 112 Same Fo in High 100 100 112 100 Same Fo in Low 100 100 100 112 Swapped 100 112(gives wrong gping) 112 100

Segregating sentence pairs by Fo • all the action is in the low frequency region (<800 Hz) • no strong evidence of across-formant grouping

Adding Fo-swapped • inappropriate pairing of Fo only detrimental above 4 semitones

Summary of Fo-differences • Across-formant grouping only significant for large Fo differences (> ~ 4 semitones) • Most of the improvement with small Fo differences happens in the F1 frequency-region.

another caveat for auto-correlation • Improvement in identification of double vowels for small ∆Fos is about as good when each vowel is made up of alternating harmonics of the two Fos(Culling & Darwin) • Autocorrelation would pull out completely wrong envelopes.

No simultaneous effect of FM • Although separation by Fo shows strong effects, there is no detectable effect of simultaneous separation by different Frequency Modulations of Fo. • Listeners unable to discriminate correlated from uncorrelated FM in simulataneous inharmonic sine waves (Carlyon).

Summary of DFo effects in separating competingvoices • Intelligibility increased by small DFo only in F1 region (and harmonic alternation tolerated)... • … but not by DFo in only higher freq. region. • Across-formant consistency of Fo only important at largerDFo • FM produces no additional separation

CRM task (tracking a sound source) (Bolia et al., 2000) • 2 simultaneous sentences each of form • Ready (Call Sign) go to (Color) (Number) now. • Same talker (TT); Same Sex (TS); Different sex (TD) • Target denoted by Call-Sign "Baron" • 8 Talkers in corpus, 2048 tokens

CRM task (Bolia et al., 2000) Listeners responded by selecting the appropriate colored digit with the computer mouse

CRM task results (Brungart et al)

Effect of change in Fo

Fo contours for 2 individuals Individuals, with most constant Fo contours, show most improvement with ∆Fo

Effect of change of VT

Effect of joint change of Fo and VT Original: male

Effect of joint change of Fo and VT Original: female

1.50 1.00 actual d' 0.50 male female 0.00 0.00 0.50 1.00 1.50 predicted d' Superadditivity of ∆Fo and ∆VT ∆Fo & ∆VT superadditive … and still less than real different-sex talkers

Conclusions • Same Fo not a sufficient condition for dichotic fusion for complemenarily filtered speech. • Intelligibility increase for small ∆Fo confined to F1 region. Only across-formant for larger ∆Fo. • Fo & VT-size useful for tracking sources across time. Superadditive.

Using Fo and vocal-tract length to attend to one of two talkers.

Using Fo and vocal-tract length to attend to one of two talkers.

Presentation Transcript

Magic of Electricity A to Z’s and One, Two, Threes

Phonetics: The vocal tract

Secondary Articulations + Vocal Tract Physiology

Articulatory Phonetics: Vocal Tract Anatomy and Articulation

To Attend or Not to Attend: Why Some Children Attend Schools and Others Don’t

Acoustics of the Vocal Tract

Voices of Alumni Unable to Attend

“One, two! One, two! and through and through

Failure to Attend School

Song is produced via vocal ‘membranes’ (labium), filtering of the vocal tract

To attend

Your invitation to attend…

3D Computer Simulation of the Human Vocal Tract

A single acoustic goal of / sh / (rel. to /s/) and 3+1 vocal-tract models

The Vocal Tract and Initiation of Speech: Anatomy and Physiology

Vocal Tract Physiology

Using metric Units to Measure Length

Welcome to Key Stage One Year One and Year Two

Using Customary Units of Length

Vocal Tract Physiology

Directivity of human talkers

Using to, two ,and too