1 / 21

Visual Attention: Selective tuning and saliency computation using game theory

The General Vision Problem (oversimplified). Images. Features/Saliency. . Context. . . Attention/Selection. . Recognition/Inference. . . . . . Can all vision problems be described by this diagram ? . Visual Attention Components (from Tsotsos et al). Selection of a region of interest in the visual fi

phuc
Download Presentation

Visual Attention: Selective tuning and saliency computation using game theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Visual Attention: Selective tuning and saliency computation using game theory Presentation prepared by Alexandre Bernardino, VisLab-ISR-IST.

    2. The General Vision Problem (oversimplified)

    3. Visual Attention Components (from Tsotsos et al) Selection of a region of interest in the visual field Selection of feature dimensions and values of interest Control of information flow through the network of neurons that constitute the visual system The shifting from one selected region to the next in time. Transformation of task information into attentional instructions Integration of successive attentional fixations Interactions with memory Indexing into model bases

    4. The Selective Tuning Model Localizes interesting regions in the visual field. Assumes “Interestingness” values can be easily computed for each item, depending on task definition. Reduces computation by utilizing a visual pyramid Addresses some problems with pyramid representations

    5. Visual Pyramids Small receptive fields at the bottom and large receptive fields at the top – may overlap. At each site and scale, the information is interpreted by “interpretive units” of different types. Each interpretive unit may receive feedback, fedforward, lateral interactions (etc...), from other units. Solve part of the complexity problem but introduce others …

    6. Benefits of Visual Pyramids

    7. Problems with information flow due to pyramidal processing (1) The Context Effect: Units at the top of the top of the pyramid receives input from a very large sub-pyramid and are confounded by the surroundings of the attended object.

    8. Problems with information flow due to pyramidal processing (2) Blurring: A single event at the input affects an inverted subpyramid of units and gets blurred as it flows upwards so that a large portion of the output represents part of it.

    9. Problems with information flow due to pyramidal processing (3) Cross-talk: Two separate visual events activate two inverted subpyramids that may overlap. Thus one event interferes with the interpretation of the other.

    10. Problems with information flow due to pyramidal processing (4) Boudary: Central items appear stronger than peripheral items since the number of upgoing connections for central objects is bigger.

    11. Tsotsos et al Selective Tuning Architecture

    12. WTA Units Il,k: the interpretive unit in assembly k in layer l; Gl,k,j: the jth WTA gating unit, in assembly k in layer l, linking Il,k with Il-1,j; gl,k: the gating control unit for the WTA over the inputs to Il,k; bl,k: the bias unit for Il,k; ql,j,i: weight applied to Il-1,i in the computation of Il,j; nl,x: scale normalization factor Ml,k: the set of gating units for Il,k; Ul+1,k: the set of gating units in layer l+1 making feedback connections to gl,k; Bl+1,k: the set of bias units in layer l+1 making feedback connections to bl,k

    13. Selective Tuning Overview Build the pyramid Compute MAX (Winner Take All) at the higher level to determine the globally most salient items. Top-down bias can be externally introduced. Inhibit units not on the winners receptive field. The process continues to the bottom of the pyramid As the prunning of connections proceeds downwards, interpretive units are recomputed and propagated upwards. BENEFITS: WTA’s are computed on “small” regions. RESULT: Selects (segments) a region that fulfils the saliency definition at all scales.

    14. Information Routing

    15. Results: Brightness and Orientation Brightness: Saliency: the largest and brightest Features: Average gray level on rectangles [6,50]x[6,50] Pyramid: local average of previous level. Orientation: Saliency: the longest and highest contrast straight line Features: edges with orientations [0, 45, 90, 135] and sizes [3,35]x[3,35] Pyramid [128,108,80,48,28]

    16. Results: Motion Simulated optic flow: Matching (correlation) against 16 templates of motion patterns Pyramid: computes local average. 4 levels.

    17. What is missing ? Salient features are predefined and very simple: Ex: The brightest and largest item. The largest and highest contrast line. The best matching item with a database. Conjuction of features is Ad-hoc: WTA within each feature dimension WTA across the winners of 1 Overall winner selects the attended region

    18. Visual Attention Using Game Theory Compute salient locations on multi-feature spaces: Each point (x,y) is associated with a unit vector of multiple features, eg. color and brightness: nx,y = (Rx,y,Gx,y,Bx,y,Ix,y)t/N Incorporates task knowledge – top-down bias as a desired feature unit vector w = (Rd,Gd,Bd,Id)t/M Salient regions in a image are defined as being similar the the desired feature vector and distinct from their neighbors: wtnA’ > wtnA ; A’ < A ; nA = sum(nx,y\in A) The subregion matches the wanted feature better than its surrounding.

    19. The Market N actors (points) K types of available goods Actor i as an allocation of goods ni \in Rk Let f(ni) be the utility of a certain allocation of goods. Each agent will trade to get the zi that solves: max (f(zi)-pt(zi-ni)); p is the price vector.

    20. The Feature Market If f(ni) is a concave function, then the market reaches competitive equilibrium: ni = naverage, in a neighborhood. f(ni) = wtni is a concave function A “fair” price is defined as p = f’(nav) = w – navnavtw = (I-navnavt)w = Aw A – projection on the orthogonal complement

    21. Saliency = Wealth Capital of actor i: Ci = pt (ni-nav) = wtA(ni-nav)= wtAni

    22. Interesting Things Normalization: Matrix A enhances directions with less items. Salience can be split in two terms: Intrinsic salience: independent of the task Si = Ani Extrinsic salience: depends on top-down bias Se = wtSi

More Related