1 / 17

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION. Yan Song, Sheng Tang, Yan-Tao Zheng , Tat- Seng Chua, Yongdong Zhang, Shouxun Lin Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

kane
Download Presentation

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A DISTRIBUTION BASED VIDEO REPRESENTATIONFOR HUMAN ACTION RECOGNITION Yan Song, ShengTang, Yan-Tao Zheng, Tat-SengChua, YongdongZhang, ShouxunLin Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2Graduate School of Chinese Academy of Sciences, Beijing, China 3Institute for Infocomm Research, A*STAR, Singapore 4School of Computing, National University of Singapore, Singapore

  2. Outline • Introduction • Steps Overview • Experiments • Conclusion

  3. Introduction • Recently, researchers has turned their attention to local spatial-temporal features for human action recognition • BoW has some drawbacks: • Partitions local feature space into discrete parts and brings ambiguity and uncertainty in video representation • Re-training is required when adding a new category to the database or applying on new database

  4. Steps Overview • Extract the Spatial-Temporal (local) feature • Applies Gaussian filter to spatial domain • Applies Gabor filter to temporal domain • Finds interest points by max arguments for response function below • R=[I * gσ(x,y) *hev(t)]2 + [ I * gσ(x,y) * hod(t)]2

  5. Generating feature vector(Behavior Recognition via Sparse Spatio-Temporal Features)[3] • Gradients can be found not only along x and y, but also along t, • spatio-temporal corners are defined as regions where the local gradient vectors point in orthogonal directions spanning x, y and t. Intuitively • a spatio-temporal corner is an image region containing a spatial corner whose velocity vector is reversing direction Visualization of cuboid based behavior recognition

  6. Interest points belonging different Gaussian components example of interest points belonging to different Gaussian components in 8 sampled frames from the action of “running”. Different colors denote different Gaussian components.

  7. Steps Overview • Represent feature vectors with Gaussian Mixture Model • It takes into account the fact that human motion pattern is continuously distributed • attempts to reveal the probabilistic structures of the local ST features • Use MDL(Minimum Description Length ) criterion to the get the number of mixture components to prevent over-fitting. • Estimate GMM with Expectation-Maximization algorithm

  8. Probabilistic Generative ModelsGaussian Mixture Model • Mixture Model • Mixture Example • http://www.csse.monash.edu.au/~lloyd/Archive/2005-06-Mixture/

  9. Using MDL to generate initial parameters for EM • GMM mixture Model: • log-likelihood function:

  10. Optimal number of components number of GMM components automatically selected by MDL criterion in the KTH dataset

  11. Steps Overview • Compute distance of two videos by (KullbackLeibler) KL divergence of two GMMs • Too high computation complexity for estimating with Monte-Carlo simulation • Uses variational lower bound [12] to estimate KL divergence

  12. KL divergenceAPPROXIMATING THE KULLBACK LEIBLER DIVERGENCE BETWEEN GAUSSIAN MIXTURE MODELS [12] • Definition of KL distance • The KL divergence of two GMM functions don’t have closed form • Uses variationallower bound[12] to estimate

  13. Experiments • Employ KTH dataset and UCF sports dataset • Using average of recognition accuracy to be evaluation criteria

  14. Average accuracies of three tests on KTH Average recognition accuracies of three tests on KTH.

  15. Average accuracies of three approaches on UCF sports Average accuracies of three approaches on UCF sports

  16. Confusion Matrices (a) Confusion matrixes on (a) KTH. (b) UCF sports

  17. Conclusion • Exploited the probabilistic distribution to encode local ST features • Makes representation compatible with most discriminative classifiers

More Related