220 likes | 773 Views
The General Vision Problem (oversimplified). Images. Features/Saliency. . Context. . . Attention/Selection. . Recognition/Inference. . . . . . Can all vision problems be described by this diagram ? . Visual Attention Components (from Tsotsos et al). Selection of a region of interest in the visual fi
E N D
1. Visual Attention:Selective tuning and saliency computation using game theory Presentation prepared by Alexandre Bernardino, VisLab-ISR-IST.
2. The General Vision Problem (oversimplified)
3. Visual Attention Components (from Tsotsos et al) Selection of a region of interest in the visual field
Selection of feature dimensions and values of interest
Control of information flow through the network of neurons that constitute the visual system
The shifting from one selected region to the next in time.
Transformation of task information into attentional instructions
Integration of successive attentional fixations
Interactions with memory
Indexing into model bases
4. The Selective Tuning Model Localizes interesting regions in the visual field.
Assumes Interestingness values can be easily computed for each item, depending on task definition.
Reduces computation by utilizing a visual pyramid
Addresses some problems with pyramid representations
5. Visual Pyramids Small receptive fields at the bottom and large receptive fields at the top may overlap.
At each site and scale, the information is interpreted by interpretive units of different types.
Each interpretive unit may receive feedback, fedforward, lateral interactions (etc...), from other units.
Solve part of the complexity problem but introduce others
6. Benefits of Visual Pyramids
7. Problems with information flow due to pyramidal processing (1) The Context Effect:
Units at the top of the top of the pyramid receives input from a very large sub-pyramid and are confounded by the surroundings of the attended object.
8. Problems with information flow due to pyramidal processing (2) Blurring:
A single event at the input affects an inverted subpyramid of units and gets blurred as it flows upwards so that a large portion of the output represents part of it.
9. Problems with information flow due to pyramidal processing (3) Cross-talk:
Two separate visual events activate two inverted subpyramids that may overlap. Thus one event interferes with the interpretation of the other.
10. Problems with information flow due to pyramidal processing (4) Boudary:
Central items appear stronger than peripheral items since the number of upgoing connections for central objects is bigger.
11. Tsotsos et al Selective Tuning Architecture
12. WTA Units Il,k: the interpretive unit in assembly k in layer l;
Gl,k,j: the jth WTA gating unit, in assembly k in layer l, linking Il,k with Il-1,j;
gl,k: the gating control unit for the WTA over the inputs to Il,k;
bl,k: the bias unit for Il,k;
ql,j,i: weight applied to Il-1,i in the computation of Il,j;
nl,x: scale normalization factor
Ml,k: the set of gating units for Il,k;
Ul+1,k: the set of gating units in layer l+1 making feedback connections to gl,k;
Bl+1,k: the set of bias units in layer l+1 making feedback connections to bl,k
13. Selective Tuning Overview Build the pyramid
Compute MAX (Winner Take All) at the higher level to determine the globally most salient items. Top-down bias can be externally introduced.
Inhibit units not on the winners receptive field.
The process continues to the bottom of the pyramid
As the prunning of connections proceeds downwards, interpretive units are recomputed and propagated upwards.
BENEFITS:
WTAs are computed on small regions.
RESULT:
Selects (segments) a region that fulfils the saliency definition at all scales.
14. Information Routing
15. Results: Brightness and Orientation Brightness:
Saliency: the largest and brightest
Features: Average gray level on rectangles [6,50]x[6,50]
Pyramid: local average of previous level.
Orientation:
Saliency: the longest and highest contrast straight line
Features: edges with orientations [0, 45, 90, 135] and sizes [3,35]x[3,35]
Pyramid [128,108,80,48,28]
16. Results: Motion Simulated optic flow:
Matching (correlation) against 16 templates of motion patterns
Pyramid: computes local average. 4 levels.
17. What is missing ? Salient features are predefined and very simple:
Ex:
The brightest and largest item.
The largest and highest contrast line.
The best matching item with a database.
Conjuction of features is Ad-hoc:
WTA within each feature dimension
WTA across the winners of 1
Overall winner selects the attended region
18. Visual Attention Using Game Theory Compute salient locations on multi-feature spaces:
Each point (x,y) is associated with a unit vector of multiple features, eg. color and brightness: nx,y = (Rx,y,Gx,y,Bx,y,Ix,y)t/N
Incorporates task knowledge top-down bias as a desired feature unit vector
w = (Rd,Gd,Bd,Id)t/M
Salient regions in a image are defined as being similar the the desired feature vector and distinct from their neighbors:
wtnA > wtnA ; A < A ; nA = sum(nx,y\in A)
The subregion matches the wanted feature better than its surrounding.
19. The Market N actors (points)
K types of available goods
Actor i as an allocation of goods ni \in Rk
Let f(ni) be the utility of a certain allocation of goods.
Each agent will trade to get the zi that solves:
max (f(zi)-pt(zi-ni)); p is the price vector.
20. The Feature Market If f(ni) is a concave function, then the market reaches competitive equilibrium:
ni = naverage, in a neighborhood.
f(ni) = wtni is a concave function
A fair price is defined as
p = f(nav) = w navnavtw = (I-navnavt)w = Aw
A projection on the orthogonal complement
21. Saliency = Wealth Capital of actor i:
Ci = pt (ni-nav) = wtA(ni-nav)= wtAni
22. Interesting Things Normalization: Matrix A enhances directions with less items.
Salience can be split in two terms:
Intrinsic salience: independent of the task
Si = Ani
Extrinsic salience: depends on top-down bias
Se = wtSi