1 / 37

Toward Learning Mixture-of-Parts Pictorial Structures

Robin Hess and Alan Fern. Toward Learning Mixture-of-Parts Pictorial Structures. School of Electrical Engineering and Computer Science Oregon State University. Talk Objectives. Overview OSU Digital Scout Project Describe problem of initial formation labeling

allen-perez
Download Presentation

Toward Learning Mixture-of-Parts Pictorial Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robin Hess and Alan Fern Toward Learning Mixture-of-Parts Pictorial Structures School of Electrical Engineering and Computer Science Oregon State University

  2. Talk Objectives • Overview OSU Digital Scout Project • Describe problem of initial formation labeling • Representational and inference challenges • Mixture-of-Parts Pictorial Structures • Model definition • Inference • Opportunities for learning • Parameters and structure • Speedup Learning • Active Learning • Transfer Learning Alan Fern Oregon State University

  3. The OSU Digital Scout Project Objective: compute semantic interpretations of football video High-level interpretation of play Raw video • Professional/college teams spend many hours attaching semantic tags to video for DB access • We want to make this process much more automatic • Support computer assisted strategic analysis of opponents Previous Work: S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999. Alan Fern Oregon State University

  4. Raw Video Data • Obtained several games worth of home field video from OSU football team • Once video file per play • Exact same video used by coaches • Video shot by single fixed location at top of Reser stadium • Camera is constantly panning and zooming Alan Fern Oregon State University

  5. Registered Video Data • Semantic interpretation requires registration of video data to football field coordinates • Developed robust registration approach [Hess & Fern, CVPR’07] planar homography Oregon State University

  6. Problem: Formation Labelling • We consider a subproblem of full play interpretation • Given: initial registered video frame of a play • Output: offensive formation • types and locations of 11 offensive players player locations & types Thousands of possible formations Oregon State University

  7. Challenges in Formation Labelling • Player appearances nearly identical • Appearance not useful for inferring player type • Difficult to robustly segment individual players • “part detector” style approaches are difficult to apply Oregon State University

  8. Challenges in Formation Labelling Different formations can differ in subtle ways Oregon State University

  9. Problem Constraints • A number of hard constraints imposed by rule book • Exactly 11 players • Exactly 7 players on line and 4 players behind line • Exactly 1 quarterback and 1 center • Location of center is at midfield or “hash line” Oregon State University

  10. Problem Constraints • Soft constraints on relative spatial locations of players • Constraints strongly depend on the set of player types Oregon State University

  11. Previous Attempt S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999. • Intille used KB of hard constraints to cast as a SAT-like problem • Constraints: “near”, “to the left of”, “bit of vertical space between”, etc. • Simplified problem by hand-labelling the field locations of the 11 players • Only tried to infer player types • Failed to get the approach to work well and was abandoned in previous work Oregon State University

  12. Structured Output Representations • Infer type & location for all of 11 players • ti {QBS, QB, C, LG, RG, LTE, . . . }, 34 types • li {(0,0),(0,1),…, (n,m)}, pixel location • Our representation must capture • Hard joint constraints among types • Soft joint constraints among locations conditioned on types and image data 22 output variables • Possible to encode constraints via standard discrete factor-graph models (e.g. CRFs, weighted CSPs, ILP, etc.) • Such encodings appear problematic wrt off-the-shelf inference techiques (?) • Domains of variables are huge many values • Large factors (e.g. exactly 7 “line type” players) • Location constraints are inherently numeric Oregon State University

  13. Pictorial Structures • Offensive formations can be viewed as multi-part articulated objects (parts correspond to players) • Pictorial structure models have been successful for multi-part objects in computer vision • Local part appearance models • Deformable connections • Joint estimation of part locations node values are part locations simply pairwisegraphical models Oregon State University Courtesy Fischler & Elschlager

  14. Oregon State University

  15. When edge structure forms a tree can use DP to compute map in O(nh2) time • n - # of parts, h - # of pixels • h2 is often impractical • If in addition dij(. , .) is a Mahalanobis distance then can do computation in O(nh) time! Oregon State University

  16. Pictorial Structures for Football • For a fixed set of player types, locations can be well approximated by pictorial structure • But part sets (i.e. player types) varies across plays • Can’t use standard pictorial structures for our problem • Can we still leverage benefits of pictorial structures? Oregon State University

  17. Mixture of Parts Pictorial Structures (MoPPS) • Captures constraints on legal part sets via pv • Captures spatial constraints among parts via f Oregon State University

  18. MoPPS Inference • Find MAP estimate of most likely set of parts and their locations: • Worst case: evaluate pictorial structure of each legal part set • Requires over an hour of processing for our problem • Need a structured MoPPS representation that can be exploited for fast inference • We use a “MoPPS Tree” Oregon State University

  19. MoPPS Tree Representation • Pictorial structure for a legal part set is projection of global tree onto part set Oregon State University

  20. MoPPS Tree for Football • 34 parts in model (one for each possible player type) • Includes local observation models • Includes pairwise spatial constraints • Also provide constraints for evaluating legal part sets Oregon State University

  21. MoPPS Tree Inference • Becomes combinatorial optimization over legal part sets • We use Branch-and-Bound Search (BBS) Oregon State University

  22. Branch-and-Bound Search • Search nodes are part sets • Internal nodes represent sets of legal part sets • Leaves are legal part sets • While solution not found • Expand least node according to ordering relation • Computer upper and lower bound • Prune any dominated node Oregon State University

  23. Lower Bound Computations • Monotonicity: adding to a set of parts will never result in reduced cost • Simply compute pictorial structure match of tree projected on parts in search node • Can improve on this by adding cost for “missing parts” Oregon State University

  24. Upper Bound Computations • Match entire MoPPS tree to image data • Use as a heuristic for quickly finding legal completion of current part set • Cost of completion is upper bound Oregon State University

  25. MoPPS Tree Parameters for Football • 34 parts, 3200+ legal formations • 16 basic player types plus subtypes • Connections modeled as Gaussian overideal location relative to “parent” player • Parameters manually set using training images • Observation model uses two independent components • : based on background model • : based on color histogramming Oregon State University

  26. Background Model • Register lots of video to field model • Learn kernel density estimate of color at each pixel Oregon State University

  27. Oregon State University

  28. Oregon State University

  29. Results Oregon State University

  30. Anytime Behavior: % Correct • Exhaustive search requires close to an hour • Greedy search is fast but achieves only 80% accuracy • Mean-squared location error less than a yard Oregon State University

  31. Directions Learning MoPPS Models • Successfully hand-coded a MoPPS model • Was quite time consuming to get parameters right • Motivates supervised structure and parameter learning • MoPPS model takes average of 4 minutes per play • Still too slow for weekly volume of game video • Motivates speedup learning • MoPPS model will sometimes need to be relearned/adapted to different sets of video • Want to reduce labelling effort • Motivates active and transfer learning Oregon State University

  32. Structure and Parameter Learning • Goal: learn structure and parameters of MoPPS tree from labelled data • Assume hard constraints on legal part sets provided • There are algorithms for learning the structure of pictorial structures • Can easily modify to learn MoPPS tree • Easy to combine with generative parameter learning Oregon State University

  33. Structure and Parameter Learning • Issue: pure generative parameter learning will not likely be sufficient • Hand-coded model incorporate “reward terms” to make up for deficiencies in generative observation model • Suggests augmenting generative model with discriminatively trained components • Issue: inference time of 4 minutes makes most generative training methods quite expensive • Suggests using approaches that do not perform full joint inference for each parameter update Oregon State University

  34. Speedup Learning • How can we speedup branch-and-bound search? • There are a number of interesting settings • Setting 1: • Given a MoPPS model & upper/lower bound functions • Learn an effective search space operators • Setting 2: • Given a MoPPS model & search space • Learn more accurate upper/lower bound functions • Setting 3: • Given a MoPPS model & search space & possibly bounds • Learn an effective priority queue ranking function Oregon State University

  35. Active Model Calibration • Want to minimize labelling effort for new video set • Active learning and/or semi-supervised • Want to leverage experience with previous videos • Transfer learning • How can we combine these two paradigms for label efficient active model calibration? • User interface is also critical • Very rough idea: • Assume fixed model structure • Learn prior on parameters from previous data sets • Use prior for regularization and example selection Oregon State University

  36. Summary and Future Work • New structured output challenge problem • We will provide labelled data set • Can off-the-shelf structured learning approaches work • Suggests investigating lesser studied directions • Speedup learning • Active calibration • On the horizon • Applying to defensive formations • Full temporal play interpretation • Mining strategic knowledge • Strategic planning Oregon State University

  37. The Digital Scout Project http://eecs.oregonstate.edu/football Oregon State University

More Related