1 / 40

Saliency-baesd Visual Attention

Saliency-baesd Visual Attention. 黃文中 2009-01-08. Outline. Introduction The Model Results Conclusion. Outline. Introduction The Model Results Conclusion. If we are asking to find…. Where to look?. Many visual processes are expensive Humans don’t process the whole visual field

leroy
Download Presentation

Saliency-baesd Visual Attention

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Saliency-baesd Visual Attention 黃文中 2009-01-08

  2. Outline • Introduction • The Model • Results • Conclusion

  3. Outline • Introduction • The Model • Results • Conclusion

  4. If we are asking to find…

  5. Where to look? • Many visual processes are expensive • Humans don’t process the whole visual field • How do we decide what to process? • How can we use insights about this to make machine vision more efficient?

  6. Visual salience • Salience ~ visual prominence • Must be cheap to calculate • Related to features that we collect from very early stages of visual processing • Colour, orientation, intensity change and motion are all important indicators of salience

  7. Saliency Map • The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene.

  8. Saliency map (Cont'd) • Two kinds of stimuli type: • Bottom-up • Depend only on the instantaneous sensory input • Without taking into account the internal state of the organism • Top-down • Take into account the internal state • Such as goals the organisms has at this time, personal history and experiences, etc

  9. Outline • Introduction • The Model • Results • Conclusion

  10. A Model of Saliency-based Visual Attention

  11. Three main step • Extraction • extract feature vectors at locations over the image plane • Activation • form an "activation map" (or maps) using the feature vectors • Normalization / Combination • normalize the activation map (or maps, followed by a combination of the maps into a single map)

  12. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  13. Image pyramids • The original image is decomposed into sets of lowpass and bandpass components via Gaussian and Laplacian pyramids. • The Gaussian pyramid consists of lowpass filtered (LPF). • The Laplacian pyramid consists of bandpass filtered (BPF).

  14. Image pyramids (Cont'd) W

  15. Extraction of early visual features • Intensity image: • Color channels: • Local orientation information: Obtained from using oriented Gabor pyramids

  16. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  17. Center-surround differences • is obtained by interpolation to the finer scale and point-by-point substraction. • Intensity contrast: • Color double-opponent: • Orientation feature maps:

  18. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  19. Map normalization operator • Map normalization operator:

  20. Map normalization operator (Cont'd) • Normalizing the values in the map to a fixed range [0..M], in order to eliminate modality-dependent amplitude differences • Finding the location of the map’s global maximum M and computing the average m of all its other local maxima • Globally multiplying the map by .

  21. Map normalization operator (Cont'd) • The method is called the global non-linear normalization. • Pros: • Computationally very simple. • Easily allows for real-time implementation because it is non-iterative. • Cons: • This strategy is not very biologically plausible, since global computations are used. • Not robust to noise, when noise can be stronger than the signal.

  22. Iterative localized interactions • Non-classical surround inhibition • Interactions within each individual feature map rather than between maps • Inhibition appears strongest at a particular distance from the center, and weakens both with shorter and longer distances. • The structure of non-classical interactions can be coarsely modeled by a two-dimensional difference-of-Gaussians(DoG) connection pattern.

  23. Iterative localized interactions (Cont'd)

  24. Iterative localized interactions (Cont'd)

  25. Iterative localized interactions (Cont'd)

  26. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  27. Across-scale combination • “ ”, which consists of reduction of each map to scale 4 and point-by-point addition:

  28. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  29. The Saliency Map (SM) • The three conspicuity maps are normalized and summed into the final input S to the saliency map: • The weights of each channel is tunable.

  30. In detail… • Nine spatial scales are created using dyadic Gaussian pyramids. • Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. • Normalization • Across-scale combination into three “conspicuity maps.” • Linear combinations to create saliency map. • Winner-take-all

  31. Winner-take-all • At any given time, only one location is selected from the early representation and copied into the central representation.

  32. Winner-take-all (Cont'd) • The FOA is shifted to the location of the winner neuron. • The global inhibition of the WTA is triggered and completely inhibits (resets) all WTA neurons. • Local inhibition is transiently activated in the SM, in an area with the size and new location of the FOA.

  33. Outline • Introduction • The Model • Results • Conclusion

  34. Model performance on noisy versions of pop-out and conjuctive tasks

  35. Outline • Introduction • The Model • Results • Conclusion

  36. Conclusion • Have proposed a conceptually simple computational model for saliency-driven focal visual attention. • The framework can consequently be easily tailored to arbitrary tasks through the implementation of dedicated feature maps.

  37. References • L. Itti, C. Koch, E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, pp. 1254-1259, Nov 1998. • L. Itti, C. Koch, A saliency-based search mechanism for overt and covert shifts of visual attention, Vision Research, Vol. 40, No. 10-12, pp. 1489-1506, May 2000. • H. Greenspan, S. Belongie, R. Goodman, P. Perona, S. Rakshit, and C.H. Anderson, “Overcomplete Steerable Pyramid Filters and Rotation Invariance,” Proc. IEEE Computer Vision and Pattern Recognition, pp. 222-228, Seattle, Wash., June 1994.

More Related