1 / 25

Correcting Cuboid Corruption For Action Recognition In Complex Environment

Correcting Cuboid Corruption For Action Recognition In Complex Environment. Syed Zain Masood, Adarsh Nagaraja, Nazar Khan, Jiejie Zhu and Marshall Tappen University of Central Florida. Action Sequences. Can be broadly divided into: Activity: Person of interest performing the action

lara-wilson
Download Presentation

Correcting Cuboid Corruption For Action Recognition In Complex Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correcting Cuboid Corruption For Action Recognition In Complex Environment Syed Zain Masood, Adarsh Nagaraja, Nazar Khan, Jiejie Zhu and Marshall Tappen University of Central Florida

  2. Action Sequences • Can be broadly divided into: • Activity: Person of interest performing the action • Background: Context and/or Clutter • Simple datasets: background uninteresting • Complex datasets: context can be useful Background Activity

  3. Complex Action Sequences • Most action recognition approaches treat action recognition problem holistically. • Systems designed to make intelligent decision when selecting features. • Complexity added till goal achieved. Global Representation

  4. Issues with Holistic Methods • Lack of understanding about the decision making process of these complex systems. • Most complex datasets have strong contextual cues. How well would a system perform on actions with unrelated complex backgrounds?

  5. Our Approach • Goal • Examine action recognition in a way that separates action from context • Is the system able to make an intelligent decision when confronted with adverse context? • Purpose • Useful to measure how much context matters • Will help improve handling background clutter • Avoid unnecessary complexity for recognition performance and thus higher efficiency

  6. Our Approach • Problem: • Current datasets: Strong contextual cues • Solution: • Need for creating a new dataset where activity without strong relevant context • Easier if based on older sets; makes it possible to benchmark against older work

  7. UCF Weizmann Dynamic Dataset • Simple actions from Weizmann Action Dataset • Complex backgrounds from YouTube • Matte action on complex background [1] • Dataset available at: http://www.cs.ucf.edu/~smasood/datasets/UCFWeizmannDynamic.zip

  8. UCF Weizmann Dynamic • No humans in background • Backgrounds selected randomly for matting • Ensures unhelpful background • In some cases, might even be detrimental e.g. different actions having the same complex background

  9. Testing Methodology • Baseline Performance • Basic “bag-of-words” system • Tuned to perform as well as a number of recently-published systems

  10. Baseline Performance • Significant drop in performance • Completely unable to deal with clutter.

  11. Why performance degrades? • Is it the matting process? • No. Tests conducted on action sequences matted on gray background show 94% recognition. • Change from simple to complex background only difference between datasets • Background cues contributing significantly to the recognition process

  12. How to remove the effect of background? • Experiment #1: • Isolate actor from videos using available masks • Prune background interest points • With no background clutter, results should be comparable to those on Weizmann dataset

  13. Experiment #1: Background Pruning • Interestingly, results improve but not as significantly as they dropped • Simple background pruning does not help our cause

  14. Background Pruning Limitations • Out-of-place actions have background clutter in cuboid at “good” interest point locations. • Interest point pruning eliminates spatial but not temporal background clutter.

  15. How to overcome this limitation? • Removing background information within cuboids might be helpful • Experiment #2: • Cuboid Masking: Zero out background frames

  16. Cuboid Masking Results • Comparable results achieved with background pruning of interest points and masking within “good” interest point cuboids

  17. The Next Steps • All above experiments were conducted using ground-truth annotations. • Now that we have identified the problem: • Need to do away with ground-truth actor masks and implement automatic localization of the actor. • Need to test the system on well known complex dataset where context might be helpful.

  18. Automatic Localization • We combine: • An off-the-shelf human detector [4,5] • Saliency detector method [6]

  19. Automatic Localization • Automatic localization not as good on UCF Weizmann Dynamic dataset • Still, optimal performance is achieved using both interest point pruning and cuboid masking for the automatic localization

  20. UCF Sports Dataset • Reasons for selecting this dataset: • Small size • Good resolution • Ground-truth actor masks available

  21. UCF Sports Dataset • Experiment using ground-truth masks: • Using both techniques gives the optimal performance

  22. UCF Sports Dataset • Experiment using automatic localization: • Automatic localization results not as bad as for UCF Weizmann Dynamic dataset • Again, using cuboid masking results in the best performance

  23. What Have We Learned? • Holistic approaches suffer without good context. • Localization is important and thus localization methods need to improve. • Correct use of localization is essential. • Once we can localize well, we can bring context back as an additional cue.

  24. References [1] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:228–242, 2008. [2] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conférence on, pages 1–8, 2008. [3] J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos ”in the wild”. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:461–468, 2009. [4] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://people.cs.uchicago.edu/ pff/latent-release4/. [5] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:1627–1645, 2010. [6] S. Goferman, L. Zelnik-Manor, and A. Tal. Context- aware saliency detection. In CVPR, pages 2376–2383. IEEE, 2010.

  25. Q & A • This work was supported by NSF grants IIS-0905387 and  IIS-0916868

More Related