540 likes | 712 Views
The Mind’s Ear How the Brain Listens to What the Ear Hears. Shihab Shamma Institute for Systems Research Electrical and Computer Engineering University of Maryland College Park. Auditory Processing. Analyzing sounds in complex reverberant environments requires :.
E N D
The Mind’s Ear How the Brain Listens to What the Ear Hears Shihab Shamma Institute for Systems Research Electrical and Computer Engineering University of Maryland College Park
Auditory Processing Analyzing sounds in complex reverberant environments requires : Extracting the spectrum of incoming sounds Estimating the pitch of concurrent sources Localizing and tracking them accurately Perceiving and recognizing their timbre
Auditory Scene Analysis Segregating multiple sound sources monaurally
Tones Tone Complex Frequency (Hz) 1000 750 500 Time Noise /a/ /i/ /u/ Vowels F3 Frequency (Hz) Frequency (Hz) 1000 1000 F2 F1 Time Time
Music & Speech Spectrograms Violin (vibrato) Piano Frequency (Hz) Frequency (Hz) F3 F2 Frequency (Hz) F1 R I g.h T A W A Y
Unnatural Distortions Down-Shift Normal Dilate Compress
An auditory scene Frequency Time
Two classes of ASA processes Frequency Simultaneous processes Sequential processes Time
Simultaneous ASA Processes • Grouping Concurrent Sounds • (What is it?) The perceptual phenomenon • Harmonicity, Onset • (What’s it good for?) Relationships with other aspects • of perception
Residue Musical Pitch Pitch Harmonicity : Musical Pitch
4000 2000 1000 500 250 125 Perceived pitch = Fundamental frequency(regardless!) Full harmonic series Missing fundamentals Frequency (Hz) Time Time
Spectral Grouping or “Fusion” of Harmonics Mistuning a harmonic • Fusion is found in humans and animals alike • Fusion also breaks with onset mismatches
Segregating Harmonic Sets Frequency Time Frequency Time
Sequential ASA processes • Streaming • (What is it?) The perceptual phenomenon • (What’s it good for?) Relationships with other aspects • of perception • (How does it come about?) Attention and Placticity
Frequency B B B B … dF … A A A A Time Miller & Heise (1950), Bregman & Campbell (1971), … Bregman (1990), …
“1 stream of sounds jumping up and down in pitch” Frequency B B B B … … A A A A Time
Frequency B B B B … dF … A A A A Time
“2 streams, one high, one low” Frequency B B B B … … A A A A Time Note: you can only attend to one stream at a time
Frequency B B … … A A A A Time
“1 stream with a galloping rhythm” Frequency … B B … … A A A A Time
“2 streams, one high and slow, the other low and fast” Frequency B B … … … A A A A Time Note: when streamed, the relative timing between A and B tones becomes less important.
Streaming also depends on temporal parameters Frequency dt B B B B … … A A A A Time
… … Streaming also depends on connectedness Frequency B B B B A A A A B B B B … … A A A A Time
Streaming based on Pitch differences Frequency Frequency B B B B … … A A A A A A A A Time Time PITCH Musical melodies also stream B B … Telemann A A A A Time
Streaming Based on Timbre Trumpet Cello Cello-Trumpet Different Spectral Envelopes Alternating Vowels /e/ and /a/
Streaming: What’s it good for ? Rhythmic Masking Frequency (Hz) Time Target Masking Frequency (Hz) Time
Simultaneous cues help perception of speech Sinewave Speech S1 F3 Frequency (Hz) S2 F2 F1 Pulsed Sinewave speech F3 S1-pulsed Frequency (Hz) F2 S2-pulsed F1
Courtesy ofDr. Chris Darwin Speech music
Courtesy ofDr. Chris Darwin Speech music
Courtesy ofDr. Chris Darwin Speech Music
Continuity Illusion Tone in Noise Glides in Noise Frequency (Hz) Frequency (Hz) Time Time Speech in Noise
The Biological Bases of Auditory Scene Analysis
Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention
A t t r i b u t e s o f C o m p l e x S o u n d s A n a t o m y o f t h e A u d i t o r y Location Timbre Pitch S y s t e m C e n t r a l A u d i t o r y S t a g e s Spatial maps Computing pitch MGB IC C o l l i c u l a r S t a g e s N L L Harmonic templates ILD, ITD Spectral cues L L M i d b r a i n N u c l e i T B The auditory spectrum D C N P V C N E a r l y A u d i t o r y A V C N S t a g e s s o u n d
Representation of Consistent Features * Timbre (Voice) * location * Stationary spectra Auditory Scene Analysis Disassembling Sorting and Streaming “Learning” Plasticity Acoustic Auditory “Scene” Primitive Cues + Auditory mixture (Two Speakers) Multi-Scale Representation Objects y c n e u q e r F Speaker A Time Primitive cues * Harmonicity * Onset/Offset Learning and Adaptation Speaker B * Multi-resolution temporal cortical dynamics * Plasticity * Attention
Multi-Resolution Analysis with Different STRFs Frequency (kHz) Time (ms)
Scale-Rate Decomposition Reconstruction
Compare prediction with current input t t t t 1 2 3 4 . . . F r e q u e n c y Exploring Streaming Mechanisms Disassembling Input Cortical Multiscale Spectral Representation Spectrogram y c n e u q e r t1..t4 F Time “Adaptive Feedback” Learned Stream ‘A’ 2Hz 4Hz 8Hz . . . Learned Stream ‘B’ Input Selector . . . Dynamics Sorting and Streaming
Integrated Streamed Tone A Tone B Initial STRF Streamed STRF A A B B Time Time STRF may evolve or adapt within seconds!
Enhancing Excitatory Fields Weakening Inhibitory Fields C Time (ms)
1 0 0 2 0 0 3 0 4 0 0 0 T i m e ( m s ) 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 0 2 0 0 3 0 4 0 5 0 0 6 0 0 7 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s ) Manipulating Temporal and Spectral Modulations Normal Spectrally smeared 2 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 5 0 0 5 0 0 2 5 0 2 5 0 1 2 5 1 2 5 1 0 2 0 3 0 4 0 5 0 6 0 0 7 0 8 0 0 9 0 1 0 0 0 5 0 0 6 0 7 0 8 0 0 9 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 T m i e ( m s ) Temporally smeared Temporally sharpened 1 0 0 2 0 3 0 0 4 0 5 0 0 6 0 7 0 0 8 0 9 0 1 0 0 0 0 0 0 0 0 T i m e ( m s )
Morph Voices
Acknowledgment Cortical Physiology and Auditory Computations Jonathan Fritz, Didier Depireux, David Klein Jonathan Simon Auditory Speech and Music Processing Tai Chi, Mounya ElHilali, Powen Ru, Nima Masgarani Supported by: MURI # N00014-97-1-0501 from the Office of Naval Research # NIDCD T32 DC00046-01 from the NIDCD # NSFD CD8803012 from the National Science Foundation