The Mind’s Ear How the Brain Listens to What the Ear Hears

The Mind’s Ear How the Brain Listens to What the Ear Hears

Listening with multiple sources poses enormous challenges I hear Shihab only ran 50 miles yesterday! Oh, no… my first Nature paper was before I won the olympic trials… Didn’t Barb give this same talk last week? Don’t you get too interested in her periphery… (Cocktail Partyby SLAW, Maniscalco Gallery)

Attention depends on competing objects being distinct • Listen for the telephone number from the male, metallic voice

Attention depends on competing objects being distinct • Listen for the telephone number from the male, metallic voice • Because the two talkers sound different, there is little problem hearing out the number…

Attention depends on competing objects being distinct • Listen for the telephone number from the male, metallic voice • Because the male voice is distinct, there is little problem hearing out the number…

BUT WHAT WAS THE OTHER SIGNAL? Attention depends on competing objects being distinct • Listen for the telephone number from the male, metallic voice • Because the male voice is distinct, there is little problem hearing out the number…

We process only one thing at a time • Listen for the telephone number from the male, metallic voice • Because the male voice is distinct, there is little problem hearing out the number… • BUT WHAT WAS THE OTHER SIGNAL?

Auditory Processing Analyzing sounds in complex reverberant environments requires : Extracting the spectrum of incoming sounds Estimating the pitch of concurrent sources Localizing and tracking them accurately Perceiving and recognizing their timbre

Tones Tone Complex Frequency (Hz) 1000 750 500 Time Noise /a/ /i/ /u/ Vowels F3 Frequency (Hz) Frequency (Hz) 1000 1000 F2 F1 Time Time

Unnatural Distortions Down-Shift Normal Dilate Compress

Frequency An auditory scene Time

Frequency Singing voice Time

Residue Musical Pitch Pitch Harmonicity : Musical Pitch

Segregating Harmonic Sets Frequency Time Frequency Time

Grouping by Onsets

AUDITORY OBJECTS • Streams • (What is it?) The perceptual phenomenon • (What’s it good for?) Relationships with other aspects • of perception • (How does it come about?) Attention and Placticity

Frequency B B … … A A A A Time

“1 stream with a galloping rhythm” Frequency … B B … … A A A A Time

“2 streams, one high and slow, the other low and fast” Frequency B B … … … A A A A Time Note: when streamed, the relative timing between A and B tones becomes less important.

… … Frequency Streaming also depends on connectedness B B B B A A A A B B B B … … A A A A Time

Frequency Frequency B B B B Streaming based on Pitch differences … … A A A A A A A A Time Time PITCH Musical melodies also stream B B … Telemann A A A A Time

Streaming Based on Timbre Trumpet Cello Cello-Trumpet Different Spectral Envelopes Alternating Vowels /e/ and /a/

Streaming is great for suppressing distractors Rhythmic Masking Frequency (Hz) Time Target Masking Frequency (Hz) Time

But you need the glue to hold all together !! Sinewave Speech S1 F3 Frequency (Hz) S2 F2 F1 Pulsed Sinewave speech F3 S1-pulsed Frequency (Hz) F2 S2-pulsed F1

Courtesy ofDr. Chris Darwin Speech music

Courtesy ofDr. Chris Darwin Speech Music

Continuity Illusion Tone in Noise Glides in Noise Frequency (Hz) Frequency (Hz) Time Time Speech in Noise

The Biological Bases of Auditory Scene Analysis What is the role of attention in these percepts ? Do Streams exist in the absence of attention? (The falling tree in the forest)

Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention

Compare prediction with current input t t t t 1 2 3 4 . . . F r e q u e n c y Exploring Streaming Mechanisms Disassembling Input Cortical Multiscale Spectral Representation Spectrogram y c n e u q e r t1..t4 F Time “Adaptive Feedback” Learned Stream ‘A’ 2Hz 4Hz 8Hz . . . Learned Stream ‘B’ Input Selector . . . Dynamics Sorting and Streaming

Rubin’s vase-faces Ambiguous stimuli, bi-stable percepts Necker’s cube have been used successfully in the past to demonstrate single-unit correlates of visual percepts (not just stimulus parameters) e.g., Logothetis & Schall (1989) Science Leopold & Logothetis (1996) Nature

A t t r i b u t e s o f C o m p l e x S o u n d s A n a t o m y o f t h e A u d i t o r y Location Timbre Pitch S y s t e m C e n t r a l A u d i t o r y S t a g e s Spatial maps Computing pitch MGB IC C o l l i c u l a r S t a g e s N L L Harmonic templates ILD, ITD Spectral cues L L M i d b r a i n N u c l e i T B The auditory spectrum D C N P V C N E a r l y A u d i t o r y A V C N S t a g e s s o u n d

Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention

Experimental Set-up

Multi-Resolution Analysis with Different STRFs Frequency (kHz) Time (ms)

Scale-Rate Decomposition Reconstruction

Patterns of Musical Timbre

Integrated Streamed Tone A Tone B Initial STRF Streamed STRF A A B B Time Time STRF may evolve or adapt within seconds!

Enhancing Excitatory Fields Weakening Inhibitory Fields C Time (ms)

1 0 0 2 0 0 3 0 4 0 0 0 T i m e ( m s ) 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 0 2 0 0 3 0 4 0 5 0 0 6 0 0 7 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s ) Manipulating Temporal and Spectral Modulations Normal Spectrally smeared 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 2 0 3 0 4 0 5 0 6 0 0 7 0 8 0 0 9 0 1 0 0 0 5 0 0 6 0 7 0 8 0 0 9 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 T m i e ( m s ) Temporally smeared Temporally sharpened 1 0 0 2 0 3 0 0 4 0 5 0 0 6 0 7 0 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s )

Morph Voices

Acknowledgment Cortical Physiology and Auditory Computations Jonathan Fritz, Didier Depireux, David Klein Jonathan Simon Auditory Speech and Music Processing Tai Chi, Mounya ElHilali, Powen Ru, Nima Masgarani Supported by: MURI # N00014-97-1-0501 from the Office of Naval Research # NIDCD T32 DC00046-01 from the NIDCD # NSFD CD8803012 from the National Science Foundation

Maybe, but: That neural responses in auditory cortex dependboth on F and T ishardly a surprise This isinsufficientevidence for the factthat streaming isrelfected in neural responses in the auditory cortex A much more convinvingcorrelate of streaming wouldbeobtained if neural responseswereshown to co-varywith the percept while the physical stimulus remainsunchanged ...

Mixed Spectrogram A + B Speaker A ) 2000 z H ( 1000 y c Frequency n e 500 u q e 250 r Speaker B scale F 125 Frequency Original Spectrograms Segregated Spectrograms A A scale ) ) 2000 2000 z z H H ( ( 1000 1000 y y c c n n e e 500 500 u u q q e e 250 250 r r F F 125 125 B B ) ) 2000 2000 z z H H ( ( 1000 1000 y y c c n n e e 500 500 u u q q e e 250 250 r r F F 125 125 500 1000 1500 2000 500 1000 1500 2000 Time (ms) Time (ms) Kalman filter outputs Spectral timbre templates

The Mind’s Ear How the Brain Listens to What the Ear Hears