Perceptual Organization

Perceptual Organization Perfecto Herrera

Introductory sound examples

Perceptual Organization “…Perceptual Organization is central to the key question of perception: how do we make the leap from information detected by our sensory receptors… to our perceptions of the world? This requires not just the detection of information by the organization of that information into veridical percepts.” “…Perceptual organization is the process by which particular relationships among potentially separate elements (including parts, features, and dimensions) are perceived (selected from alternative relationships) and guide the interpretation of those elements… in sum, how we process sensory information in context.” Pomerantz & Kubovy, 1986

The problem(s) of perceptual organization

Perceptual Organization Attention

Some terms • Source – the physical entity that gives rise to the sound pressure waves (e.g. a violin being played) • Stream – the percept of a group of successive and/or simultaneous sounds as a coherent whole appearing to come from a single source (e.g., the brass section) • The sounds we hear at any one time usually come from a number of different sources. • In most cases we can hear and identify each of the different sound sources as having its own pitch, timbre, loudness and location (stream=source). In other cases several sources are processed as a single stream as their features do not qualify for being considered as “distinct” (e.g., string section). In other –exotic- cases, a single source may yield different streams.

Auditory Scene Analysis • A computational theory of hearing is required; plus a functional explanation of the information processing problems that the auditory system must solve in order to make sense of the acoustic environment • A computational theory of hearing deals with the question: what is the purpose of hearing? Which are the constraints and regularities hearing can exploit? • Work in computer vision has benefited from a computational theory since the late 1970’s, due to David Marr • A similar foundation for hearing was developed by Albert Bregman at McGill University in Montreal and is known as auditory scene analysis

Auditory Scene Analysis ASA can be conceptualized as a two-stage process: • The mixture of sounds is decomposed into a collection of sensory elements (onsets, pitch trajectories, modulations, spectral tracks, etc.) • Elements that are likely to have arisen from the same event are grouped to form a perceptual structure (stream) which can be interpreted by higher centers in the brain For example, when listening to a violin performance, it is the task of auditory scene analysis to group the acoustic events emitted from the physical source (the violin) into a perceptual stream (the mental experience of a violin being played). Is this the only way of listening? What about “reduced listening”? Read Pierre Schaeffer

Auditory Scene Analysis In most listening situations, a mixture of sounds reaches the ears. However we can: • Attend to one conversation amid many competing voices and other background sounds (e.g. music) at a ‘cocktail party’ • Follow the melodic line played by the violins in an orchestral recording. This problem is of great scientific interest, and a solution also has engineering applications -> The Holy Grial!!! “Auditory image” of Bach’s Mass in Bm, consisting of voice, violin, cello etc. How does the auditory system process this image to recover a description of each source?

Active Perception: expectation-based processing (bottom-up + top-down)

Auditory Scene Analysis • The inner ear separates sound into its frequency components • At some point in the auditory system these components need to be assigned to the appropriate sound source • Often called “perceptual grouping”, or “auditory scene analysis” • Two aspects: simultaneous grouping and sequential grouping

Auditory Scene Analysis • Simultaneous grouping – the grouping together of the simultaneous frequency components that come from a single source • Sequential grouping – the connecting over time of the changing frequencies that a single source produces from one moment to the next

Example: simultaneous grouping and sequential grouping

Antecedents: Gestalt Psychology • Gestalt means “pattern“ in German • Gestalt Psychology originated in early XXth century: Max Wertheimer (1880-1943), Wolfgang Köhler (1887-1967) and Kurt Koffka (1886-1941) • The basic principles underlying Gestalt psychology are • The whole is greater than the sum of the parts • The parts are defined by the whole as much as vice versa • Gestalt psychologists are best known for their work in vision – but their principles are also applicable to auditory perception. • They systematically developed a set of principles of perceptual organisation (believed to be innate) that they thought determine how we assemble or associate components in a perceptual field • These principles are…

Gestalt Psychology Principles • Proximity • Similarity • Common Fate (Common Direction) • Good Continuation • Disjoint Allocation (Belongingness) • Closure Bottom Up: Hard wired, Pre-attentive, Not Learned (primitive) Top Down: Plastic, Learned (schema-driven)

Proximity • In vision when elements in an image are close together they are perceived to be together and separate from others that are further away, even though they are similar • In hearing, sounds occurring together over time are clustered

Similarity • Two or more auditory events are grouped if they are similar in timbre, pitch, loudness or close in apparent location or time • Fundamentals in same region but harmonics are not, leads to fission i.e. Different timbres but same pitch = unfused • Harmonics in same region but fundamentals not, leads to fusion i.e. Different pitches but same timbre = fused • This is not clear-cut –depends on individual differences. • If the difference in loudness is large enough they form different streams – either can be attended to • Same dB  single stream at twice the tempo

Common Fate • Components in sound act together • They tend to start and finish together • They tend to change in pitch or intensity together • Therefore if we have a complex sound and the components are co-ordinated then they are fused, e.g. onset disparities, and AM and FM (tremolo & vibrato) • For example if harmonics 2,4 and 8’s frequency is modulated (FM) they separate from harmonics 3,5,6 and 7 • Or if the frequency of the 1st harmonic is modulated (FM) at a different rate it separates from harmonics 3,4 and 5

Good Continuation • Natural sound sources tend change gradually rather than abruptly in frequency, intensity, location or timbre • Abrupt change  new stream  new source • Low and high tones tend to split into streams – this can be suppressed by putting glides in between In speech if there are oscillations in frequency it gives the impression that there are two speakers saying the one word • In music in general if a note is near in pitch to the one just before it then it will be heard as the next note in the melody rather than a note that is separate - higher or lower

Disjoint Allocation (Belongingness) • One component can only come from one source – i.e. hearing tries to use each component only once • Say we have two tones at slightly different pitches and these can either be heard in isolation or embedded in another series of pitches – thus In isolation the order of AB or BA is easily judged. • The addition of pitches (X’s) that are close in pitch to AB act as distracters making it difficult to order AB (This is thought to be because we attend more to the start and end of sequences). • But if more X’s are added, they form a stream that is separate from AB and again the order of AB is easily judged. • This not hard & fast – ambiguity is possible and this shows that this level of organisation is on the boundary of being pre-attentive and attentive • It also shows how the addition of new elements changes the perceptual organisation of the stimulus. When is it more difficult to tell if A sounded before B? (assume fast tempo)

Closure • A source maybe obscured or absent – but its percept continues • e.g. FM radio – disturbance from ignition of passing cars – we hear a click over the sound whereas in fact the radio is producing only a click and the sound is off • A pitched sound that is broken but the gap is filled by noise seems unbroken • Similarly a glide that is broken but the gap is filled with noise seems unbroken

Auditory Scene Analysis Bregman re-examines the Gestalt principles and proposes the simultaneous and sequential grouping cues as the basic elements of information that help to organize our perception: what, when, where, how Bregman, A. S. (1990) Auditory scene analysis: the perceptual organisation of sound. Cambridge, Mass.: The MIT Press But see also: Wang, D. & Brown, G. (Editors) (2006). Computational Auditory Scene Analysis: Principles, Algorithms and Applications. New York: Wiley.

Example of CASA-based auditory segmentation An Auditory Scene Analysis Approach to Speech Segregation, Wang (2005)

Simultaneous Grouping Sequential Grouping

Simultaneous grouping Some cues: • Fundamental Frequency and Spectral Regularity • Onset Timing • Correlated changes in Amplitude or Frequency • Sound Location • Important: A single cue may not be effective all the time – these cues work together for perceptual organisation of the input sound

Fundamental Frequency • Consider two musical instruments each playing a note simultaneously • It is easier to hear each note and each instrument if they are playing different notes (have different fundamental frequencies) • Simultaneous sounds are more likely to fuse if they have the same fundamental frequency • It has been shown that a pair of simultaneously presented vowels are easier to identify if their fundamental frequencies differ

Spectral Regularity • Perceptual fusion of the frequency components from a harmonic sound – harmonicity – heard as a single sound • If a frequency component does not form part of the harmonic series it tends to be heard out separately – as if part of a different source

Onset disparities • Perceptual separation on tones enhanced by onset asynchrony. • A frequency component that stops or starts at a different time from the complex sound is less likely to be heard as part of it than if it is simultaneous with it • Importance to make a “soloist” standing out

Onset disparities • We can hear each of two ‘simultaneously’ played notes easier if there is a small onset difference between them • These onset asynchronies are up to 30ms – so the percept is still of the notes sounding together • The auditory system can exploit these onset differences even though we are not consciously aware of them • Ensemble playing – completely synchronised?

Correlated Changes in Amplitude or Frequency • A sound may be perceptually segregated from an unchanging background if its components are modulated in amplitude or frequency • Hear harmonic complex tone • Harmonics 1, 3, 5, 6, 7 remain steady • Harmonics 2, 4, and 8 rise and fall in frequency four times • Hear the two sets as separate sounds

Sound Location • Sounds coming from different locations in space are generally assumed to be from different sources • But… this is a weak cue for simultaneous grouping; it becomes stronger for sequential grouping

Sequential organisation • Events in the world occur over time. We organise sounds into sequences over time using various criteria • Events that are similar in some way (e.g. in loudness or pitch) or going in the same direction (e.g. rising or falling) are perceived to have the same origin. • Music uses this principle • Streams are created by differences in pitch, loudness, timbre, repetition rate etc and by combining these in different ways. • Characteristics of Streams: • Streams are separate – we only attend to one fully at a time. • Foreground and Background – possibly 3 maximum • Streams organisation is relative rather than absolute • Stream organisation may change as the complexity of the stimulus changes • Some aspects of streaming are pre-attentive, others are attentive, i.e. attentive means that by attending to different aspects of a stimulus we hear different things

Sequential grouping • Periodicity cues: periodic oscillations help to group objects according to their rates • Spectral cues: we tend to group in time elements that appear in the same spectral regions (e.g., high partials vs. low partials) • Level (intensity) cues: we tend to group in time elements of similar level • Spatial cues: we tend to group in time elements coming from the same place

Heard as: Heard as: Features Important for Sequential Grouping • Spectral distribution (old+new heuristic)

Streaming • What happens when pitch separation and/or repetition rate are varied? • If we compress the time dimension do we hear notes that are further apart in frequency belonging together? • This was tested by Van Noorden (1976,1977), who found: • Segregation depends on repetition rate and pitch separation • When stream segregation occurs, we are unable to attend fully to the events in both streams at the same time • We find it difficult to distinguish the order of events across streams • We have trouble hearing the overall rhythm of the sequence

Streaming • Frequency and temporal contiguity – auditory streaming Freq. separation

The Figure-Ground Phenomenon and Attention • Generally we do not attend to every aspect of the auditory input – certain parts are selected for conscious analysis • Complex sound is analysed into streams – we attend to one stream at a time – attended stream stands out perceptually – rest of sound becomes less salient • Separation into attended and unattended streams is equivalent to the ‘figure-ground phenomenon’ • Examples: Attending to one conversation at a time at a party – other conversations form a background; music with soloists; TV + noisy home… • Importance of changes – the listeners’ attention is usually drawn to aspects of the sound that are changing – it becomes figure while the relatively unchanging part(s) become background

Guess who wrote this text: “It is not enough to be able to describe the response of single cells, nor predict the results of psychophysical experiments. Nor is it enough even to write computer programs that perform approximately in the desired way: One has to do all these things at once, and also be very aware of the computational theory...”

This presentation reused materials from educational and research slides and documents by • Dan Ellis • Guy Brown • Niall Griffith • Rianna Walsh • Chris Darwin • Sue Denham

Perceptual Organization