Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Analysis and Recognition of Video Data Tamir Nuriel

Flowchart of the approach

Interest Points Detector • Gaussian smoothing in the space dimension. • Gabor filters in the time dimension. • Extract spatial-temporal cube around interesting points.

Descriptor • Brightness gradients on x, y and t directions. • The computed gradients are concatenated to form a vector. This descriptor is then projected to a lower dimensional space using the principal component analysis (PCA) dimensionality reduction technique. • Instead of performing dimension reduction using PCA - Histogram of gradients in each direction.

Codebook Formation • The codebook is constructed by clustering using the k-means algorithm and Euclidean distance as the clustering metric. • The center of each resulting cluster is defined to be a spatial-temporal codeword.

Learning the Action Models by pLSA • Maximizing • E-step: • M-step:

Experimental results • Patches from different actions from the KTH dataset:

Experimental results • Marking patches in video

Experimental results • Confusion Matrix

References • J. C. Niebles, H. Wang and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words”, International Journal of Computer Vision. In press. 2008. • C. Schuldt, I. Laptev, B. Caputo, “Recognizing human actions: a local SVM approach”, In Proc. ICPR 2004. • L. Zelnik-Manor, M. Irani, “Event-based analysis of video”, CVPR 2001.

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Presentation Transcript

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning of Visual Sense Models for Polysemous Words

Unsupervised learning

Unsupervised Learning of Hierarchical Spatial Structures

Unsupervised Learning

Unsupervised Learning

Spatial-Temporal Information Representation Using GPU

Human Action Recognition using Spatio-Temporal Classification

HUMAN ACTION RECOGNITION IN TEMPORAL-VECTOR TRAJECTORY LEARNING FRAMEWORK

Unsupervised Learning of Categories from Sets of Partially Matching Image Features

Unsupervised Learning of Visual Object Categories

Unsupervised learning

Unsupervised Learning

Unsupervised Learning

Unsupervised learning

Unsupervised Learning

Unsupervised Learning