A Model of saliency-Based Visual Attention for Rapid Scene Analysis

A Model of saliency-Based Visual Attention for Rapid Scene Analysis Reporter: You Jian Laurent Itti, Christof Koch, and Ernst Niebur

Introduction • Visual attention: • Focus of attention • Focus selection • Rapid, saliency-driven, task-independent • Slow, volition-control, task-dependent

Model

Model • Size: usually 640 * 480 • 9 [0..8] spatial scales are created. • Dyadic Gaussian pyramids • Center is a pixel at scale • Surround is the pixel at scale • ：interpolation to the finer scale and point-by-point subtraction.

Extraction of Early Visual Features • r, g, and b is the red, green, and blue channels of the input image. • Intensity image: • the r, g, and b channels are normalized by I in order to decouple hue from intensity. • Hue variation are not perceivable at very low luminance • Normalization is only applied at the location where Other location yield zero r, g, b

Four broadly-tuned color channels

Create Gaussian Pyramids • Create Gaussian Pyramids for I, R, G, B, and Y • and

Feature Maps-Intensity Contrast • Feature Map • Center-surround differences( ) between a “Center” fine scale c and a “surround” coarser scale s yield the feature maps. • A set of six maps

Feature Maps-Color Double Opponent • Spatial and chromatic opponency in human primary visual cortex. • red/green and green/red • blue/yellow, and yellow/blue color pairs

Feature Maps-Orientation • Obtained from I using oriented Gabor pyramid • In total 42 feature maps are computed : • six for intensity • 12 for color • 24 for orientation.

The Saliency Map • The saliency map • Represent the conspicuity-or “saliency”-at every location in the visual field by a scalar quantity • Guide the selection of attended locations, based on the spatial distribution of saliency.

The Saliency Map • Combination of the feature maps • Provides vides bottom-up input to the saliency map • Modeled as a dynamical neural network. • Difficulty: • Different dynamic ranges • Different extraction mechanisms. • Salient objects appearing strongly in only a few maps

The Map Normalization • Normalization: • Promotes maps with a small number of strong peaks of activity (conspicuous location). • Suppress maps with numerous comparable peak response.

The Map Normalization • The consist of:

The Map Normalization • Biological motivation • Lateral inhibition mechanisms. • Neighboring similar features inhibit each other via specific, anatomically defined connections.

Combination of Feature Maps • Feature maps are combined into three ”conspicuity maps”: • for intensity • for color • for orientation

Combination of Feature Maps • Combine operator consist of : • reduction of each map to scale four • point by point addition

Combination of Feature Maps • The motivation of the creation of the three separate channels and the individual normalization: • Similar features compete strongly for saliency, • Different modalities contribute independently to the saliency map.

Focus Selection • Model the SM as a 2D layer of leaky integrate-and-fire neurons at scale four • The potential of SM neurons at more salient location hence increases faster • Each SM neurons excites its own WTA neuron. • WTA neurons independent until one first reach the threshold and fires.

Focus Selection • That triggers three simultaneous mechanism: • FOA is shifted • Global inhibition of the WTA • Local inhibition of the Winner

A Model of saliency-Based Visual Attention for Rapid Scene Analysis