The Mind’s Ear How the Brain Listens to What the Ear Hears

The Mind’s Ear How the Brain Listens to What the Ear Hears Shihab Shamma Institute for Systems Research Electrical and Computer Engineering University of Maryland College Park

Auditory Processing Analyzing sounds in complex reverberant environments requires : Extracting the spectrum of incoming sounds Estimating the pitch of concurrent sources Localizing and tracking them accurately Perceiving and recognizing their timbre

Auditory Scene Analysis Segregating multiple sound sources monaurally

Tones Tone Complex Frequency (Hz) 1000 750 500 Time Noise /a/ /i/ /u/ Vowels F3 Frequency (Hz) Frequency (Hz) 1000 1000 F2 F1 Time Time

Music & Speech Spectrograms Violin (vibrato) Piano Frequency (Hz) Frequency (Hz) F3 F2 Frequency (Hz) F1 R I g.h T A W A Y

Unnatural Distortions Down-Shift Normal Dilate Compress

An auditory scene Frequency Time

Two classes of ASA processes Frequency Simultaneous processes Sequential processes Time

Simultaneous ASA Processes • Grouping Concurrent Sounds • (What is it?) The perceptual phenomenon • Harmonicity, Onset • (What’s it good for?) Relationships with other aspects • of perception

Residue Musical Pitch Pitch Harmonicity : Musical Pitch

4000 2000 1000 500 250 125 Perceived pitch = Fundamental frequency(regardless!) Full harmonic series Missing fundamentals Frequency (Hz) Time Time

Spectral Grouping or “Fusion” of Harmonics Mistuning a harmonic • Fusion is found in humans and animals alike • Fusion also breaks with onset mismatches

Segregating Harmonic Sets Frequency Time Frequency Time

Grouping by Onsets

Sequential ASA processes • Streaming • (What is it?) The perceptual phenomenon • (What’s it good for?) Relationships with other aspects • of perception • (How does it come about?) Attention and Placticity

Frequency B B B B … dF … A A A A Time Miller & Heise (1950), Bregman & Campbell (1971), … Bregman (1990), …

“1 stream of sounds jumping up and down in pitch” Frequency B B B B … … A A A A Time

Frequency B B B B … dF … A A A A Time

“2 streams, one high, one low” Frequency B B B B … … A A A A Time Note: you can only attend to one stream at a time

Frequency B B … … A A A A Time

“1 stream with a galloping rhythm” Frequency … B B … … A A A A Time

“2 streams, one high and slow, the other low and fast” Frequency B B … … … A A A A Time Note: when streamed, the relative timing between A and B tones becomes less important.

Streaming also depends on temporal parameters Frequency dt B B B B … … A A A A Time

… … Streaming also depends on connectedness Frequency B B B B A A A A B B B B … … A A A A Time

Streaming based on Pitch differences Frequency Frequency B B B B … … A A A A A A A A Time Time PITCH Musical melodies also stream B B … Telemann A A A A Time

Streaming Based on Timbre Trumpet Cello Cello-Trumpet Different Spectral Envelopes Alternating Vowels /e/ and /a/

Streaming: What’s it good for ? Rhythmic Masking Frequency (Hz) Time Target Masking Frequency (Hz) Time

Simultaneous cues help perception of speech Sinewave Speech S1 F3 Frequency (Hz) S2 F2 F1 Pulsed Sinewave speech F3 S1-pulsed Frequency (Hz) F2 S2-pulsed F1

Courtesy ofDr. Chris Darwin Speech music

Courtesy ofDr. Chris Darwin Speech Music

Continuity Illusion Tone in Noise Glides in Noise Frequency (Hz) Frequency (Hz) Time Time Speech in Noise

The Biological Bases of Auditory Scene Analysis

Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention

A t t r i b u t e s o f C o m p l e x S o u n d s A n a t o m y o f t h e A u d i t o r y Location Timbre Pitch S y s t e m C e n t r a l A u d i t o r y S t a g e s Spatial maps Computing pitch MGB IC C o l l i c u l a r S t a g e s N L L Harmonic templates ILD, ITD Spectral cues L L M i d b r a i n N u c l e i T B The auditory spectrum D C N P V C N E a r l y A u d i t o r y A V C N S t a g e s s o u n d

Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention

Experimental Set-up

Multi-Resolution Analysis with Different STRFs Frequency (kHz) Time (ms)

Scale-Rate Decomposition Reconstruction

Patterns of Musical Timbre

Compare prediction with current input t t t t 1 2 3 4 . . . F r e q u e n c y Exploring Streaming Mechanisms Disassembling Input Cortical Multiscale Spectral Representation Spectrogram y c n e u q e r t1..t4 F Time “Adaptive Feedback” Learned Stream ‘A’ 2Hz 4Hz 8Hz . . . Learned Stream ‘B’ Input Selector . . . Dynamics Sorting and Streaming

Integrated Streamed Tone A Tone B Initial STRF Streamed STRF A A B B Time Time STRF may evolve or adapt within seconds!

Enhancing Excitatory Fields Weakening Inhibitory Fields C Time (ms)

1 0 0 2 0 0 3 0 4 0 0 0 T i m e ( m s ) 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 0 2 0 0 3 0 4 0 5 0 0 6 0 0 7 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s ) Manipulating Temporal and Spectral Modulations Normal Spectrally smeared 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 2 0 3 0 4 0 5 0 6 0 0 7 0 8 0 0 9 0 1 0 0 0 5 0 0 6 0 7 0 8 0 0 9 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 T m i e ( m s ) Temporally smeared Temporally sharpened 1 0 0 2 0 3 0 0 4 0 5 0 0 6 0 7 0 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s )

Morph Voices

Acknowledgment Cortical Physiology and Auditory Computations Jonathan Fritz, Didier Depireux, David Klein Jonathan Simon Auditory Speech and Music Processing Tai Chi, Mounya ElHilali, Powen Ru, Nima Masgarani Supported by: MURI # N00014-97-1-0501 from the Office of Naval Research # NIDCD T32 DC00046-01 from the NIDCD # NSFD CD8803012 from the National Science Foundation

The Mind’s Ear How the Brain Listens to What the Ear Hears