Studying Visual Attention with the Visual Search Paradigm Marc Pomplun

Studying Visual Attention with theVisual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston E-mail: marc@cs.umb.edu Homepage: http://www.cs.umb.edu/~marc/

Overview: The Feature Integration Theory Visual Search The Guided Search Theory The Area Activation Model Studying Visual Attention with theVisual Search Paradigm

The Binding Problem • Different features of the visual scene are coded by separate systems • e.g., direction of motion, location, color and orientation • How do we know this? • Anatomical & neurophysiological evidence • Brain Imaging (fMRI & PET) • So how do we experience a coherent world?

Feature Integration Theory (Treisman et al) • Attention is used to bind features together • Code one object at a time on the basis of its location • Bind together whatever features are attended at that location

Feature Integration Theory • Sensory “features” (color, size, orientation, etc) are coded in parallel by specialized modules • Modules form two kinds of “maps” • Feature maps (e.g., color maps, orientation maps etc.) • A master map of locations

Feature Integration Theory • Feature maps contain two kinds of information: - presence of a feature anywhere in the field (“there’s something red out there”) - implicit spatial information about the feature • Activity in the feature maps can tell us which features are contained in the visual scene. • It cannot tell us which other features the “green blob” has. • The master map codes the location of features.

Feature Integration Theory The basic idea of the FIT is that visual attention is used for • Locating features • Binding appropriate features together There are two stages of object perception: • Preattentive stage: Individual features are extracted in parallel across the whole visual scene. • Attentive stage: When attention is directed to a location, the local features are combined to form a whole.

Feature Integration Theory • Attention moves within the location map • Focus of attention selects whatever features are linked to that location • Features of other objects are excluded • Attended features are then entered into the current temporary object representation

Feature Integration Theory Empirical evidence for the FIT has been obtained through • Visual search tasks • Illusory conjunctions We will focus on the paradigm of visual search.

Visual Search

Feature Search • Is there a red T in the display? T T T • Target defined by a single feature T T T T T T T T • According to FIT, this should not demand attention • Target should “pop out”

Conjunction Search • Is there a red T in the display? T X X T • Target is now defined by its shape and color X T T T T T T T • This involves binding features and so should demand attention X X • Need to attend to each item until target is found

T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T Feature Search Changing the number of distractors:

X T T T T T X X T X X T X X T X X T T X X T T X T X T X X T T T X T X T T T X T X X T T X X Conjunction Search Changing the number of distractors:

Visual Search Experiments • Record time taken to determine whether target is present or not • Vary the number of distractors • Search for features should be independent of the number of distractors • Conjunction search should get slower with more distractors

Visual Search • Feature targets pop out  flat display size function • • Conjunction targets demand serial search •  significant slope

X O O X X X O O O X Problem with FIT: Pop-Out of Conjunction Targets • A moving X pops out of a display of moving O’s and static X’s • Target is defined by a conjunction of movement and form • At least some conjunctions do not require focal attention

Guided Search Theory The Guided Search Theory (GST) is similar to the FIT in that it also assumes two subsequent stages of visual search performance: • a preattentive, parallel stage • an attentive, serial stage However, the main difference to FIT is that GST assumes the preattentive stage to obtain spatial saliency information that is used to guide attention in the serial stage.

Guided Search Theory According to GST, saliency is encoded in an additional map, called the saliency map. The saliency map is created during the preattentive stage and can combine multiple features if necessary. In the subsequent serial search process, attention is first directed to the highest “peak” in the saliency map, then to the second-highest, and so on. This visual guidance allows efficient search even for some conjunction targets.

Guided Search Theory Support for the GST comes from eye-movement research. Eye-movement recording allows researchers to determine the items that a subject looks at during visual search.

Guided Search Theory

Guided Search Theory In the previous example, • 80% of fixations were closest to an item sharing color with the target, • 20% of fixations were closest to an item sharing orientation with the target. It seems that the color dimension is guiding the subject’s visual search process. Of course, due to imprecision of eye movements and their measurement, better statistics are necessary to determine the guiding dimension.

Guided Search Theory In visual search tasks, subjects are usually guided by one target feature or a combination of target features. This supports the idea of GST that preattentively derived information from multiple dimensions guides and thereby facilitates the subsequent serial search process.

Guided Search Theory There are two problems with GST: • According to GST, grouping the guiding distractors should result in reduced guidance (less bottom-up activation). However, the opposite happens. • There is no quantitative implementation of a Guided Search model that could predict guidance, i.e., saccadic selectivity for a given search task. To overcome these problems, we proposed the Area Activation Model of saccadic selectivity in visual search tasks.

Area Activation Assumptions: • Processing resources during a fixation are distributed like a two-dimensional Gaussian function centered at fixation. • Fixation positions are chosen to allow a maximum of information processing according to the assumed processing resources. • Scan paths are chosen in such a way that they connect the optimal fixation positions with minimal eye-movement cost (path length).

Area Activation - Strong Guidance

Area Activation - Weak Guidance

Area Activation - Empirical Results

Area Activation • Problems with the Area Activation Model: • Empirical number of fixations per trial needs to be known in advance. • Only very basic factors influencing visual search have been implemented so far. • Nevertheless, Area Activation can be considered a very first step towards a quantitative model of visual search.

Conclusions We have discussed how the visual search paradigm can be employed to investigate the mechanisms of visual attention. Various models of attention have been developed and evaluated with visual search tasks; in more recent studies, this was done based on eye-movement data. In the next lecture, we will look at slightly different paradigms, which are aimed at identifying factors that determine visual scan paths. See you then!

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun