Correcting Cuboid Corruption For Action Recognition In Complex Environment

Correcting Cuboid Corruption For Action Recognition In Complex Environment Syed Zain Masood, Adarsh Nagaraja, Nazar Khan, Jiejie Zhu and Marshall Tappen University of Central Florida

Action Sequences • Can be broadly divided into: • Activity: Person of interest performing the action • Background: Context and/or Clutter • Simple datasets: background uninteresting • Complex datasets: context can be useful Background Activity

Complex Action Sequences • Most action recognition approaches treat action recognition problem holistically. • Systems designed to make intelligent decision when selecting features. • Complexity added till goal achieved. Global Representation

Issues with Holistic Methods • Lack of understanding about the decision making process of these complex systems. • Most complex datasets have strong contextual cues. How well would a system perform on actions with unrelated complex backgrounds?

Our Approach • Goal • Examine action recognition in a way that separates action from context • Is the system able to make an intelligent decision when confronted with adverse context? • Purpose • Useful to measure how much context matters • Will help improve handling background clutter • Avoid unnecessary complexity for recognition performance and thus higher efficiency

Our Approach • Problem: • Current datasets: Strong contextual cues • Solution: • Need for creating a new dataset where activity without strong relevant context • Easier if based on older sets; makes it possible to benchmark against older work

UCF Weizmann Dynamic Dataset • Simple actions from Weizmann Action Dataset • Complex backgrounds from YouTube • Matte action on complex background [1] • Dataset available at: http://www.cs.ucf.edu/~smasood/datasets/UCFWeizmannDynamic.zip

UCF Weizmann Dynamic • No humans in background • Backgrounds selected randomly for matting • Ensures unhelpful background • In some cases, might even be detrimental e.g. different actions having the same complex background

Testing Methodology • Baseline Performance • Basic “bag-of-words” system • Tuned to perform as well as a number of recently-published systems

Baseline Performance • Significant drop in performance • Completely unable to deal with clutter.

Why performance degrades? • Is it the matting process? • No. Tests conducted on action sequences matted on gray background show 94% recognition. • Change from simple to complex background only difference between datasets • Background cues contributing significantly to the recognition process

How to remove the effect of background? • Experiment #1: • Isolate actor from videos using available masks • Prune background interest points • With no background clutter, results should be comparable to those on Weizmann dataset

Experiment #1: Background Pruning • Interestingly, results improve but not as significantly as they dropped • Simple background pruning does not help our cause

Background Pruning Limitations • Out-of-place actions have background clutter in cuboid at “good” interest point locations. • Interest point pruning eliminates spatial but not temporal background clutter.

How to overcome this limitation? • Removing background information within cuboids might be helpful • Experiment #2: • Cuboid Masking: Zero out background frames

Cuboid Masking Results • Comparable results achieved with background pruning of interest points and masking within “good” interest point cuboids

The Next Steps • All above experiments were conducted using ground-truth annotations. • Now that we have identified the problem: • Need to do away with ground-truth actor masks and implement automatic localization of the actor. • Need to test the system on well known complex dataset where context might be helpful.

Automatic Localization • We combine: • An off-the-shelf human detector [4,5] • Saliency detector method [6]

Automatic Localization • Automatic localization not as good on UCF Weizmann Dynamic dataset • Still, optimal performance is achieved using both interest point pruning and cuboid masking for the automatic localization

UCF Sports Dataset • Reasons for selecting this dataset: • Small size • Good resolution • Ground-truth actor masks available

UCF Sports Dataset • Experiment using ground-truth masks: • Using both techniques gives the optimal performance

UCF Sports Dataset • Experiment using automatic localization: • Automatic localization results not as bad as for UCF Weizmann Dynamic dataset • Again, using cuboid masking results in the best performance

What Have We Learned? • Holistic approaches suffer without good context. • Localization is important and thus localization methods need to improve. • Correct use of localization is essential. • Once we can localize well, we can bring context back as an additional cue.

References [1] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:228–242, 2008. [2] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conférence on, pages 1–8, 2008. [3] J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos ”in the wild”. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:461–468, 2009. [4] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discriminatively trained deformable part models, release 4. http://people.cs.uchicago.edu/ pff/latent-release4/. [5] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:1627–1645, 2010. [6] S. Goferman, L. Zelnik-Manor, and A. Tal. Context- aware saliency detection. In CVPR, pages 2376–2383. IEEE, 2010.

Q & A • This work was supported by NSF grants IIS-0905387 and IIS-0916868

Correcting Cuboid Corruption For Action Recognition In Complex Environment

Correcting Cuboid Corruption For Action Recognition In Complex Environment

Presentation Transcript

Business Case for Collective Action Against Corruption

Natural Language Processing for Action Recognition

Action Recognition

Data Driven Attributes for Action Recognition

Exemplar-SVM for Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Action Recognition

Pattern Recognition in Action for Cataloging and Metadata

Exemplar-SVM for Action Recognition

Human Action Recognition

Action Recognition

Chaotic Invariants for Human Action Recognition

Cubes in a Skeleton Cuboid

Action Recognition in Temporally Untrimmed Videos

National action plan for recognition Task Force for Recognition of Qualifications

Automatic Speaker Recognition In Forensic Environment

CUBE and CUBOID

Data Driven Attributes for Action Recognition