1 / 40

Mohak Kumar Sukhwani 201307583 Advisor: Prof. C. V. Jawahar

Understanding and Describing Tennis Videos. Mohak Kumar Sukhwani 201307583 Advisor: Prof. C. V. Jawahar Center for Visual Information Technology, IIIT-Hyderabad, India. Sports Video Analysis. Cricket : Temporal segmentation and annotation of actions with semantic descriptions.

kunklec
Download Presentation

Mohak Kumar Sukhwani 201307583 Advisor: Prof. C. V. Jawahar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding and Describing Tennis Videos Mohak Kumar Sukhwani 201307583 Advisor: Prof. C. V. Jawahar Center for Visual Information Technology, IIIT-Hyderabad, India

  2. Sports Video Analysis Cricket: Temporal segmentation and annotation of actions with semantic descriptions. Snooker and volley ball: (Left) Analysis of shot trajectories and stroke analysis . (Right) Player identification and action recognition.

  3. Ice-Hockey: Player recognition and tracking on field. Soccer: Real-time football analysis include automatic game summarization, player tracking, highlight extraction

  4. Handball: Trajectory-based handball video understanding. Basketball: Tracking players under global appearance constraints.

  5. Computer Vision and Language Processing < video slide – motivation > How will you describe it?

  6. Visual-Semantic Alignments (Varied Approaches)

  7. Our Approach Descriptions IN, Winner: Serena!!! High kick serve, Williams returns a backhand return, short rally, Sharapova cross-court backhand lands out-side the court.

  8. Tennis Data New Video

  9. Does confining the domain help? Frequency comparison of unrestricted tennis text (tennis news, blogs, etc.- denoted by `*’) with tennis commentaries.

  10. Phrase Recognition Description Retrieval Action Recognition Action Localization

  11. Dataset (a) Annotated-action. (b) Video commentary.

  12. Text Corpus Source: Tennis Earth - http://www.tennisearth.com/.

  13. Action Localization Player Detection Phrase recognition accuracy averaged over top 5 retrieval. Player Detectionon test videos.

  14. Player Recognition • color based descriptors (MPEG-7 SCD, CLD) • edge based descriptor (MPEG-7 EHD) • color and texture information (MPEG-7-like CEDD)

  15. Weak Learners for Action recognition Feature Extraction(Dense Trajectory) Encoding and Pooling ( Bag of Words) Discriminative Classifier (Multiclass SVM) Activity Action level of semantics waits for ball, serves a good one, crafts a forehand return forehand, backhand, volley

  16. Improved Dense Trajectories as a feature vector ! Dense Sampling in each spatial scale Trajectory-aligned descriptors Feature tracking - Capture the intrinsic dynamic structures in video - MBH is robust to camera motion - Detect human body to remove spurious trajectories

  17. What's with Camera motion ? Separate models for upper and lower action !

  18. We are already done with Training !

  19. We test ontennis point videos. Pairwise phrase cohesion MRF based Temporal Smoothing. SVM score Retrieval Module

  20. How about, joint model for phrase classification? - Semi automatic process for phrase alignment. - No manual shot sampling. - No tiring action annotations.

  21. (subject), (object), (subject;verb), (object;verb), (subject;prep;object), (object;prep;object), (attribute;subject), (attribute;object) and (verb;prep;object). Commentary Text 9 phrase encodings

  22. IN, Winner: Serena!!!  Huge serve. Ace !!! <winner Serena>, <huge serve>, <ace> IN, Winner: Zvonareva !!!  Good serve in the middle, Williams returns a quick forehand return, short rally, Serena cross-court fails to clear the net in the middle. <winner Zvonareva >,<quick return>, <short rally>,<Williams return return>,<cross-court fail>

  23. Probabilistic Label Consistent KSVD Action Trajectory matrix Sparse code Phrase label matrix Optimal dictionary Tennis point video Sliding window PC - Phrase cluster Y = H =

  24. For test videos,

  25. Commentary generation Commentary Collection Phrases + Players Online Offline Representation [ tf-idf/LSI ] Representation [ tf-idf/LSI ] Query Representation Document Representation Index Comparison Function [ TF-IDF/LSI ] Voila !

  26. LSI for better text retrieval SVD n > k Term based indexing Latent Concept based indexing • Map documents (and terms) to a low-dimensional representation. • Design a mapping such that the low-dimensional space reflects semantic associations (latent semantic space). • Compute document similarity based on the inner product in this latent semantic space

  27. Illustration of the approach Input sequence of videos is first translated into a set of phrases, which are then used to produce the final description

  28. Quantitative Comparisons Template based CNN + RNN CCA + Semantic Correlation matching CCA + SSVM

  29. The premise is indeed true. Confined domain does help !

  30. Qualitative Results

  31. Qualitative Comparisons Youtube2Text : S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell & K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013. RNN: A. Karpathy and L. Fei-Fei. Deep visual-vemantic alignments for generating image descriptions. In CVPR, 2015.

  32. Human Evaluation

  33. Contribution

  34. < video slide >

  35. Other Applications 1. Smart theatrics: Narration generation for dance dramas. Ballet Kathak Kabuki 2. Sports: Other sporting events. Baseball Volley ball Cricket

  36. PossibleExtensions! Longer Text More realistic and exhaustive game description. (Requires better topic modelling and retrieval methods) Data collection a challenge – too much of variations.

  37. Ball Tracking Tried simple kalman filtering. How about RNNs ? Will it actually help and add to content understanding ?

  38. Related Publications • Mohak Sukhwani and C.V. Jawahar, Tennis Vid2Text : Fine-Grained Descriptions for Domain Specific Videos, Proceedings of the 26th British Machine Vision Conference (BMVC), 07-10 Sep 2015, Swansea, UK. • Mohak Sukhwani and C.V. Jawahar, Frame level Annotations for Tennis Videos, 23rd International Conference on Pattern Recognition, ICPR 2016 (Under Review)

  39. < video slide – human evaluation >

More Related