Auditory scene analysis Day 15

Auditory scene analysis Day 15 Music Cognition MUSC 495.02, NSCI 466, NSCI 710.03 Harry Howard Barbara Jazwinski Tulane University

Course administration • Spend provost's money Music Cognition - Jazwinski & Howard - Tulane University

Goals for today Music Cognition - Jazwinski & Howard - Tulane University

Statement of the problem

The ball-room problemHelmholtz (1863) "In the interior of a ball-room … there are a number of musical instruments in action, speaking men and women, rustling garments, gliding feet, clinking glasses, and so on … a tumbled entanglement [that is] complicated beyond conception. And yet … the ear is able to distinguish all the separate constituent pats of this confused whole." Music Cognition - Jazwinski & Howard - Tulane University

… which is a well-known problem in speech perception • “One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it ‘the cocktail party problem’…” (Cherry, 1957) • “For ‘cocktail party’-like situations… when all voices are equally loud, speech remains intelligible for normal-hearing listeners even when there are as many as six interfering talkers” (Bronkhorst & Plomp, 1992) Music Cognition - Jazwinski & Howard - Tulane University

What would the analog be in music? • The orchestra problem? Music Cognition - Jazwinski & Howard - Tulane University

additive noise from other sound sources channel distortion reverberationfrom surface reflections Model as sources of intrusion and distortion Music Cognition - Jazwinski & Howard - Tulane University

Some review with new information about computational/mathematical modeling

The auditory periphery A complex mechanism for transducing pressure variations in the air to neural impulses in auditory nerve fibers Music Cognition - Jazwinski & Howard - Tulane University

Traveling wave • Different frequencies of sound give rise to maximum vibrations at different places along the basilar membrane. • The frequency of vibration at a given place is equal to that of the nearest stimulus component (resonance). • Hence, the cochlea performs a frequency analysis. Music Cognition - Jazwinski & Howard - Tulane University

Cochlear filtering model • The gammatone function approximates physiologically-recorded impulse responses • n = filter order (4) • b = bandwidth • f0 = centre frequency • f = phase Music Cognition - Jazwinski & Howard - Tulane University

Gammatone filterbank • Each position on the basilar membrane is simulated by a single gammatone filter with appropriate centre frequency and bandwidth. • A small number of filters (e.g. 32) are generally sufficient to cover the range 50-8 kHz. • Note variation in bandwidth with frequency (unlike Fourier analysis). Music Cognition - Jazwinski & Howard - Tulane University

Response to a pure tone • Many channels respond, but those closest to the target tone frequency respond most strongly (place coding). • The interval between successive peaks also encodes the tone frequency (temporal coding). • Note propagation delay along the membrane model. Music Cognition - Jazwinski & Howard - Tulane University

Spectrogram vs. cochleogram • Spectrogram • Plot of log energy across time and frequency (linear frequency scale) • ‘Cochleogram’ • Cochlear filtering by the gammatone filterbank (or other models of cochlear filtering) • Quasi-logarithmic frequency scale, and filter bandwidth is frequency-dependent • Previous work suggests better resilience to noise than spectrogram • Let’s call it ‘cochleogram’ Music Cognition - Jazwinski & Howard - Tulane University

Beyond the periphery The auditory system (Source: Arbib, 1989) • The auditory system is complex: four relay stations between periphery and cortex rather than one as in the visual system • In comparison to the auditory periphery, central parts of the auditory system are less understood. • Number of neurons in the primary auditory cortex is comparable to that in the primary visual cortex despite the fact that the number of fibers in the auditory nerve is far fewer than that of the optic nerve (thousands vs. millions) The auditory nerve Music Cognition - Jazwinski & Howard - Tulane University

Auditory scene analysis

Auditory scene analysis (ASA) • Listeners are capable of parsing an acoustic scene to form a mental representation of each sound source – a stream – in the perceptual process of auditory scene analysis (Bregman, 1990) • From events to streams • Two conceptual processes of ASA: • Segmentation • Decompose the acoustic mixture into sensory elements (segments) • Grouping • Combine segments into streams, so that segments in the same stream originate from the same source • Two sorts of temporal organization • Simultaneous • Sequential Music Cognition - Jazwinski & Howard - Tulane University

Simultaneous organization • Groups sound components that overlap in time. • Some cues for simultaneous organization • Proximity in frequency (spectral proximity) • Common periodicity • Harmonicity • Fine temporal structure • Common spatial location • Common onset (and to a lesser degree, common offset) • Common temporal modulation • Amplitude modulation • Frequency modulation Music Cognition - Jazwinski & Howard - Tulane University

Sequential organization • Groups sound components across time. • Some cues for sequential organization: • Proximity in time and frequency • Temporal and spectral continuity • Common spatial location; more generally, spatial continuity • Smooth pitch contour • Rhythmic structure • Rhythmic attention theory (Large and Jones, 1999) Music Cognition - Jazwinski & Howard - Tulane University

Two processes for grouping • Primitive grouping (bottom-up) • Innate data-driven mechanisms, consistent with those described by Gestalt psychologists for visual perception (proximity, similarity, common fate, good continuation, etc.) • It is domain-general, and exploits intrinsic structure of environmental sound • Grouping cues described earlier are primitive in nature • Schema-driven grouping (model-based or top-down) • Learned knowledge about speech, music and other environmental sounds. • It is domain-specific, e.g. organization of speech sounds into syllables Music Cognition - Jazwinski & Howard - Tulane University

Organisation in speech: Broadband spectrogram “… pure pleasure … ” continuity onset synchrony offset synchrony common AM harmonicity Music Cognition - Jazwinski & Howard - Tulane University

Organisation in speech: Narrowband spectrogram “… pure pleasure … ” continuity onset synchrony offset synchrony harmonicity Music Cognition - Jazwinski & Howard - Tulane University

CASA system architecture Music Cognition - Jazwinski & Howard - Tulane University

Music cognition Scheirer, E. D. Bregman's chimerae: Music perception as auditory scene analysis.

The goal • “… is to explain the human ability to map incoming acoustic data into emotional, music-theoretical, or other high-level cognitive representations, and to provide evidence from psychological experimentation for these explanations.” Music Cognition - Jazwinski & Howard - Tulane University

A bottom-up model of musical perception and cognition • Boxes contain "facilities" or processes which operate on streams of input and produce streams of output. • Arrows denote these streams and are labeled with a rough indication of the types of information they might contain. • Italicized labels beneath the "music perception" and "music cognition" boxes indicate into which of these categories various musical properties might fall. Music Cognition - Jazwinski & Howard - Tulane University

More explanation • Acoustic events enter the ear as waves of varying sound-pressure level and are processed by the cochlea into streams of band-passed power levels at various frequencies. • The harmonically-related peaks in the time-frequency spectrum specified by the channels of filterbank output are grouped into "notes" or "complex tones" using auditory grouping rules such as continuation, harmonicity, and common onset time. • Properties of these notes such as timbre, pitch, loudness, and perhaps their rhythmic relationships over time, are determined by a low-level "music perception" facility. • Once the properties of the component notes are known, the relationships they bear to each other and to the ongoing flow of time can be analyzed, and higher-level structures such as melodies, chords, and key centers can be constructed. • These high-level descriptions give rise to the final "emotive" content of the listening experience as well as other forms of high-level understanding and modeling, such as recognition, affective response, and the capacity for theoretical analysis. Music Cognition - Jazwinski & Howard - Tulane University

One assumption which bears examination • The explicitly mono-directional flow of data from "low-level" processes to "high-level" processes • that is, that the implication that higher-level cognitive models have little or no impact on the stages of lower-level processing. • We know from existing experimental data that this upward data-flow model is untrue in particular cases. • For example, frequency contours in melodies can lead to a percept of accent structure, which in turn leads to the belief that the accented notes are louder than the unaccented. • Thus, the high-level process of melodic understanding impacts the "lower-level" process of determining the loudness of notes. Music Cognition - Jazwinski & Howard - Tulane University

Another assumption which bears examination • In computer-music research, the process of turning a digital-audio signal into a symbolic representation of the same musical content is termed the transcription problem, and has received much study. • The assumption that "notes" are the fundamental mental representations of all musical perception and cognition requires that there be a transcription facility in the brain to produce them. • This assumption, and especially the implicated requirement, are largely unsupported by experimental evidence. • We have no percept of most of the individual notes which comprise the chords and rhythms in the densely-scored inner sections of a Beethoven symphonic development. • While highly-trained individuals may be able to "hear out" some of the specific pitches and timbres through a difficult process of listening and deduction, this is surely not the way in which the general experience of hearing music unfolds. Music Cognition - Jazwinski & Howard - Tulane University

A "top-down" or "prediction-driven" model of music perception and cognition • Boxes again represent processing facilities; • arrows are unlabeled to indicate less knowledge about the exact types of information being passed from box to box. Music Cognition - Jazwinski & Howard - Tulane University

More explanation • In this model, predictions based on the current musical context are compared against the incoming psychoacoustic cues. • Prediction is dependent on what has been previously heard, and what is known about the musical domain from innate constraints and learned acculturation. • The agreements and/or disagreements between prediction and realization are reconciled and reflected in a new representation of the musical situation. • Note that within this model, the types of representations actually present in a mental taxonomy of musical context are as yet unspecified. Music Cognition - Jazwinski & Howard - Tulane University

Auditory chimera • One element of the internal representation of music which has been somewhat underexamined is called an auditory chimera by Bregman: • [Music often wants] the listener to accept the simultaneous roll of the drum, clash of the cymbal, and brief pulse of noise from the woodwinds as a single coherent event with its own striking properties. • The sound is chimeric in the sense that it does not belong to any single environmental object. [Bregman 1990 p. 460, emphasis added] Music Cognition - Jazwinski & Howard - Tulane University

An example • Again arguing from intuition, it seems likely the majority of the inner-part content of a Beethoven symphony is perceived in exactly this manner. • That is, multiple non-melodic voices are grouped together into a single virtual "orchestral" sound object which has certain properties analogous to "timbre" and "harmonic implication", and which is, crucially, irreducible into perceptually smaller units. • It is the combined and continuing experience of these "chimeric" objects which gives the music its particular quality in the large -- that is, what the music "sounds like" on a global level. • In fact, it seems likely that a good portion of the harmonic and textural impact of a piece of complex music is carried by such objects. Music Cognition - Jazwinski & Howard - Tulane University

Next Monday Prediction in music

Auditory scene analysis Day 15

Auditory scene analysis Day 15

Presentation Transcript

Auditory Scene Analysis

Auditory deficits Day 11

Computational Auditory Scene Analysis and Its Potential Application to Hearing Aids

Crime Scene Analysis

An Auditory Scene Analysis Approach to Speech Segregation

Visual and auditory scene analysis using graphical models

Disorders of auditory processing DAY 21 – Oct 15, 2013

Scene Analysis Task

Effects of attention on auditory scene analysis.

Crime Scene Analysis

Auditory Scene Analysis (ASA)

Scene Analysis

Auditory scene

Scene Analysis

Scene Analysis

Computational Auditory Scene Analysis

Computational Auditory Scene Analysis

Urban Scene Analysis

Computational Auditory Scene Analysis and Its Potential Application to Hearing Aids

Crime Scene Analysis

Auditory Scene Analysis

Urban Scene Analysis