Toward Learning Mixture-of-Parts Pictorial Structures

Robin Hess and Alan Fern Toward Learning Mixture-of-Parts Pictorial Structures School of Electrical Engineering and Computer Science Oregon State University

Talk Objectives • Overview OSU Digital Scout Project • Describe problem of initial formation labeling • Representational and inference challenges • Mixture-of-Parts Pictorial Structures • Model definition • Inference • Opportunities for learning • Parameters and structure • Speedup Learning • Active Learning • Transfer Learning Alan Fern Oregon State University

The OSU Digital Scout Project Objective: compute semantic interpretations of football video High-level interpretation of play Raw video • Professional/college teams spend many hours attaching semantic tags to video for DB access • We want to make this process much more automatic • Support computer assisted strategic analysis of opponents Previous Work: S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999. Alan Fern Oregon State University

Raw Video Data • Obtained several games worth of home field video from OSU football team • Once video file per play • Exact same video used by coaches • Video shot by single fixed location at top of Reser stadium • Camera is constantly panning and zooming Alan Fern Oregon State University

Registered Video Data • Semantic interpretation requires registration of video data to football field coordinates • Developed robust registration approach [Hess & Fern, CVPR’07] planar homography Oregon State University

Problem: Formation Labelling • We consider a subproblem of full play interpretation • Given: initial registered video frame of a play • Output: offensive formation • types and locations of 11 offensive players player locations & types Thousands of possible formations Oregon State University

Challenges in Formation Labelling • Player appearances nearly identical • Appearance not useful for inferring player type • Difficult to robustly segment individual players • “part detector” style approaches are difficult to apply Oregon State University

Challenges in Formation Labelling Different formations can differ in subtle ways Oregon State University

Problem Constraints • A number of hard constraints imposed by rule book • Exactly 11 players • Exactly 7 players on line and 4 players behind line • Exactly 1 quarterback and 1 center • Location of center is at midfield or “hash line” Oregon State University

Problem Constraints • Soft constraints on relative spatial locations of players • Constraints strongly depend on the set of player types Oregon State University

Previous Attempt S. Intille. Visual Recognition of Multi-Agent Action. PhD Thesis, MIT, 1999. • Intille used KB of hard constraints to cast as a SAT-like problem • Constraints: “near”, “to the left of”, “bit of vertical space between”, etc. • Simplified problem by hand-labelling the field locations of the 11 players • Only tried to infer player types • Failed to get the approach to work well and was abandoned in previous work Oregon State University

Structured Output Representations • Infer type & location for all of 11 players • ti {QBS, QB, C, LG, RG, LTE, . . . }, 34 types • li {(0,0),(0,1),…, (n,m)}, pixel location • Our representation must capture • Hard joint constraints among types • Soft joint constraints among locations conditioned on types and image data 22 output variables • Possible to encode constraints via standard discrete factor-graph models (e.g. CRFs, weighted CSPs, ILP, etc.) • Such encodings appear problematic wrt off-the-shelf inference techiques (?) • Domains of variables are huge many values • Large factors (e.g. exactly 7 “line type” players) • Location constraints are inherently numeric Oregon State University

Pictorial Structures • Offensive formations can be viewed as multi-part articulated objects (parts correspond to players) • Pictorial structure models have been successful for multi-part objects in computer vision • Local part appearance models • Deformable connections • Joint estimation of part locations node values are part locations simply pairwisegraphical models Oregon State University Courtesy Fischler & Elschlager

Oregon State University

When edge structure forms a tree can use DP to compute map in O(nh2) time • n - # of parts, h - # of pixels • h2 is often impractical • If in addition dij(. , .) is a Mahalanobis distance then can do computation in O(nh) time! Oregon State University

Pictorial Structures for Football • For a fixed set of player types, locations can be well approximated by pictorial structure • But part sets (i.e. player types) varies across plays • Can’t use standard pictorial structures for our problem • Can we still leverage benefits of pictorial structures? Oregon State University

Mixture of Parts Pictorial Structures (MoPPS) • Captures constraints on legal part sets via pv • Captures spatial constraints among parts via f Oregon State University

MoPPS Inference • Find MAP estimate of most likely set of parts and their locations: • Worst case: evaluate pictorial structure of each legal part set • Requires over an hour of processing for our problem • Need a structured MoPPS representation that can be exploited for fast inference • We use a “MoPPS Tree” Oregon State University

MoPPS Tree Representation • Pictorial structure for a legal part set is projection of global tree onto part set Oregon State University

MoPPS Tree for Football • 34 parts in model (one for each possible player type) • Includes local observation models • Includes pairwise spatial constraints • Also provide constraints for evaluating legal part sets Oregon State University

MoPPS Tree Inference • Becomes combinatorial optimization over legal part sets • We use Branch-and-Bound Search (BBS) Oregon State University

Branch-and-Bound Search • Search nodes are part sets • Internal nodes represent sets of legal part sets • Leaves are legal part sets • While solution not found • Expand least node according to ordering relation • Computer upper and lower bound • Prune any dominated node Oregon State University

Lower Bound Computations • Monotonicity: adding to a set of parts will never result in reduced cost • Simply compute pictorial structure match of tree projected on parts in search node • Can improve on this by adding cost for “missing parts” Oregon State University

Upper Bound Computations • Match entire MoPPS tree to image data • Use as a heuristic for quickly finding legal completion of current part set • Cost of completion is upper bound Oregon State University

MoPPS Tree Parameters for Football • 34 parts, 3200+ legal formations • 16 basic player types plus subtypes • Connections modeled as Gaussian overideal location relative to “parent” player • Parameters manually set using training images • Observation model uses two independent components • : based on background model • : based on color histogramming Oregon State University

Background Model • Register lots of video to field model • Learn kernel density estimate of color at each pixel Oregon State University

Oregon State University

Results Oregon State University

Anytime Behavior: % Correct • Exhaustive search requires close to an hour • Greedy search is fast but achieves only 80% accuracy • Mean-squared location error less than a yard Oregon State University

Directions Learning MoPPS Models • Successfully hand-coded a MoPPS model • Was quite time consuming to get parameters right • Motivates supervised structure and parameter learning • MoPPS model takes average of 4 minutes per play • Still too slow for weekly volume of game video • Motivates speedup learning • MoPPS model will sometimes need to be relearned/adapted to different sets of video • Want to reduce labelling effort • Motivates active and transfer learning Oregon State University

Structure and Parameter Learning • Goal: learn structure and parameters of MoPPS tree from labelled data • Assume hard constraints on legal part sets provided • There are algorithms for learning the structure of pictorial structures • Can easily modify to learn MoPPS tree • Easy to combine with generative parameter learning Oregon State University

Structure and Parameter Learning • Issue: pure generative parameter learning will not likely be sufficient • Hand-coded model incorporate “reward terms” to make up for deficiencies in generative observation model • Suggests augmenting generative model with discriminatively trained components • Issue: inference time of 4 minutes makes most generative training methods quite expensive • Suggests using approaches that do not perform full joint inference for each parameter update Oregon State University

Speedup Learning • How can we speedup branch-and-bound search? • There are a number of interesting settings • Setting 1: • Given a MoPPS model & upper/lower bound functions • Learn an effective search space operators • Setting 2: • Given a MoPPS model & search space • Learn more accurate upper/lower bound functions • Setting 3: • Given a MoPPS model & search space & possibly bounds • Learn an effective priority queue ranking function Oregon State University

Active Model Calibration • Want to minimize labelling effort for new video set • Active learning and/or semi-supervised • Want to leverage experience with previous videos • Transfer learning • How can we combine these two paradigms for label efficient active model calibration? • User interface is also critical • Very rough idea: • Assume fixed model structure • Learn prior on parameters from previous data sets • Use prior for regularization and example selection Oregon State University

Summary and Future Work • New structured output challenge problem • We will provide labelled data set • Can off-the-shelf structured learning approaches work • Suggests investigating lesser studied directions • Speedup learning • Active calibration • On the horizon • Applying to defensive formations • Full temporal play interpretation • Mining strategic knowledge • Strategic planning Oregon State University

The Digital Scout Project http://eecs.oregonstate.edu/football Oregon State University

Toward Learning Mixture-of-Parts Pictorial Structures