100 likes | 243 Views
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. Analysis and Recognition of Video Data Tamir Nuriel. Flowchart of the approach. Interest Points Detector. Gaussian smoothing in the space dimension. Gabor filters in the time dimension.
E N D
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Analysis and Recognition of Video Data Tamir Nuriel
Interest Points Detector • Gaussian smoothing in the space dimension. • Gabor filters in the time dimension. • Extract spatial-temporal cube around interesting points.
Descriptor • Brightness gradients on x, y and t directions. • The computed gradients are concatenated to form a vector. This descriptor is then projected to a lower dimensional space using the principal component analysis (PCA) dimensionality reduction technique. • Instead of performing dimension reduction using PCA - Histogram of gradients in each direction.
Codebook Formation • The codebook is constructed by clustering using the k-means algorithm and Euclidean distance as the clustering metric. • The center of each resulting cluster is defined to be a spatial-temporal codeword.
Learning the Action Models by pLSA • Maximizing • E-step: • M-step:
Experimental results • Patches from different actions from the KTH dataset:
Experimental results • Marking patches in video
Experimental results • Confusion Matrix
References • J. C. Niebles, H. Wang and L. Fei-Fei, “Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words”, International Journal of Computer Vision. In press. 2008. • C. Schuldt, I. Laptev, B. Caputo, “Recognizing human actions: a local SVM approach”, In Proc. ICPR 2004. • L. Zelnik-Manor, M. Irani, “Event-based analysis of video”, CVPR 2001.