1 / 47

Understanding Motion and Behavior Structures: Part 2

Explore the three levels of motion understanding, from movement to action to activity, and delve into the grammar-based representation and parsing of structured events using Stochastic Context-Free Grammars. Learn about the advantages of using SCFGs for handling uncertainties in activities and how they enforce spatial and temporal consistency for parsing complex behaviors.

donna-tran
Download Presentation

Understanding Motion and Behavior Structures: Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seeing Action: Part 2Statistics and/or Structure Aaron Bobickafb@cc.gatech.edu School of Interactive Computing College of ComputingGeorgia Tech

  2. Context Continuing from the lower middle??? • Three levels of understanding motion or behavior: Movement - atomic behaviors defined by motion "Bending down", "(door) rising up", Swinging a hammer Action – a single, semantically meaningful "event" "Opening a door", "Lifting a package" Typically short in time Might be definable in terms of motion; especially so in a particular context. Activity – a behavior or collection of actions with a purpose/intention. "Delivering packages" Typically has causal underpinnings Can be thought of as statistically structured events Maybe Actions are movements in context??

  3. Structure and Statistics (Old and new) • Grammar-based representation and parsing • Highly expressive for activity description • Easy to build higher level activity from reused low level vocabulary. • P-Net (Propagation nets) – really stochastic Petri nets • Specify the structure – with some annotation can learn detectors and triggering probabilities • Statistics of events • Low level events are statistically sequenced – too hard to learn full model. • N-grams or suffix trees

  4. "Higher-level" Activities: Known structure, uncertain elements • Many activities are comprised of a priori defined sequences of primitive elements. • Dancing, conducting, pitching, stealing a car from a parking lot. • The states are not hidden. • The activities can be described by a set of grammar-like rules; often ad hoc approaches taken. • But, the sequences are uncertain: • Uncertain performance of elements • Uncertain observation of elements

  5. The basic idea and approach • Low-level primitives with uncertain feature detection (individual elements might be HMMs) • High-level description found by parsing input stream of uncertain primitives. • Extend Stochastic Context Free Grammars to handle perceptually relevant uncertainty. Idea: split the problem into: Approach:

  6. Stochastic CFGs • Traditional SCFGs have probabilities associated with the production rules. Traditional parsing yields most likely parse given a known set of input symbols. PIECE -> BAR PIECE | [0.5] BAR [0.5] BAR -> TWO | [0.5] THREE [0.5] THREE -> down3 right3 up3 [1.0] TWO -> down2 up2 [1.0] • Thanks to Andreas Stolcke’spriori work on parsing SCFGsusing efficient Earley parser.

  7. Extending SCFGs (Ivanov and Bobick, PAMI) • Within the parser we handle: • Uncertainty about input symbols Input is multi-valued string (vector of likelihoods) • Deletion, substitution, and insertion errors Introduce error rules • Individually recognized primitives typically temporally inconsistent Introduce penalty for overlap. Spatial and temporal consistency enforced. • Need to define when a symbol has been generated. • How do we learn production probabilities? (Not many examples.) Make sure not too sensitive to them.

  8. Enforcing temporal consistency Output of oneHMM parsing backwards - Output event P(primitive) Time

  9. Video Sample

  10. Event Grammar and Parsing • Tracker generates events: ENTER, LOST, FOUND, EXIT, STOP. Tracks have properties (e.g. size) and trajectories. • Tracker assigns class to each event, though only probabilistically. • Parser parses single stream that contains interleaved events: (CAR-ENTER, CAR-STOP, PERSON-FOUND, CAR-EXIT, PERSON-EXIT) • Parser enforces spatial and temporal consistency for each object class and interactions (e.g. to be a PICK-UP, the PERSON-FOUND event must be close to CAR-STOP) • Spatial and temporal consistency eliminates symbolic ambiguity.

  11. Advantages of SCFGs • What grammar can do (simplified): CAR_PASS -> CAR_ENTER CAR_EXIT | CAR_ENTER CAR_HIDDEN CAR_EXIT CAR_HIDDEN -> CAR_LOST CAR_FOUND | CAR_LOST CAR_FOUND CAR_HIDDEN • Skip allows concurrency (and junk): PERSON_LOST -> person_lost | SKIP person_lost • Concurrent parse: Events: ce pe cl cf cs px pl cx PICKUP -> ce pe clcfcs px plcx P_PASS -> ce pe cl cf cs px pl cx

  12. Parsing System

  13. Parse 1: Person-pass- through

  14. Parse 2: Drive-in

  15. Parse 3: Car-pass-through

  16. Parse 4: Drop-off

  17. Advantages of STCFG approach • Structure and components of activities defined a priori and are the right levels of annotation to recover (compare to HMMs). • FSM vs CFG is not the point. Rather explicit representation of structural elements and uncertainties. • Often many (enough) examples of each primitive to support training, but not of higher level activity. • Allows for integration of heterogeneous primitive detectors; only assumes likelihood generation. • More robust than ad-hoc rule based techniques: handles errors through probability. • No notion of causality, or anything other than (multi-stream) sequencing.

  18. Advantages of STCFG approach • Structure and components of activities defined a priori and are the right levels of annotation to recover (compare to HMMs). • FSM vs CFG is not the point. Rather explicit representation of structural elements and uncertainties. • Often many (enough) examples of each primitive to support training, but not of higher level activity. • Allows for integration of heterogeneous primitive detectors; only assumes likelihood generation. • More robust than ad-hoc rule based techniques: handles errors through probability. • No notion of causality, or anything other than (multi-stream) sequencing.

  19. Some Q's about Representations… • Scope and Range: • thoughts??? • "Language" of the representation • Grammar of explicit symbols • Computability of an instance: • Quite easy. Given the input string the parsing is both the computation and the matching • Learnability of the "class": • Inside-outside algorithm for learning CFGs but lets be serious… • Stability in face of perceptual uncertainty • Explicitly designed to handle this uncertainty • Inference-support • Depends on what you mean by inference. No notion of real semantics or explicit time.

  20. P-Nets (Propagation Networks) (Shi and Bobick, ’04 and ’06) • Nodes represent activationintervals • Active vs. inactive: Token propagation • More than one node can be active at a time! • Links represent partial order as well logical constraint • Duration model on each link and node: • Explicit model on length of activation • Explicit model on length between successive intervals • Observation model on each node

  21. Conceptual Schema • Logical relation • Autonomous assumption: logic constraint only exists at start/end points of any intervals • Condition probability function can represent any logical function Examples of logic constraint

  22. Propagation Net – Computing • Computational Schema A DBN style rollout to compute corresponding conceptual schema

  23. Experiment: Glucose Project • Task: monitor an user to calibrate a glucose meter and point out operating error as feedback. • Constructed 16 node P-Net as representation • 3 subjects with total of 21 perfect sequences, 10 missing_1_step sequences and 10 missing_6_steps sequences

  24. D-Condensation Initiate 1 particle at dummy starting node Repeat For each particle generate all possible consequent states calculate the probability for each states End Select n particles to survive Until the final time steps is reached Output the path represented by the particle with highest probability

  25. Experiment: Glucose Meter Calibration

  26. Experiment: Classification Performance

  27. Experiment: Label individual frames Labeling individual nodes Labels on Node J: Insert

  28. And now some statistics… • Problem: the higher level world of activity is not usually a P-Net or an FSM or an HMM or … • Two possible solutions: • Understand what's really going on… …another time. • Lose the structure

  29. Stochastic sequencing (Hamid and Bobick) • A priori define some low-level "actions"/events that can be stochastically detected in context – e.g. Door opening • Collect training data (streams of events) of activities – making a delivery, UPS pick-up, trash collection • Collect histograms of N-tuples and do both activity discovery and recognition • Later can focus on anomalies • Advantages: cheat where easy, learn the hard stuff, exploit the context

  30. Barnes & Nobles Loading Dock

  31. Barnes & Nobles Loading Dock

  32. Barnes & Nobles Loading Dock

  33. Barnes & Nobles Loading Dock

  34. Two levels in the representation Low Level: Events (computer vision problem) • Background subtraction and Foreground extraction (better “modeling”) • Classifying (per frame) each foreground object as either • Person • Vehicle (what type if possible) • Package • Tool used to move packages • Miscellaneous object • Tracking people, vehicles, packages, tools, and miscellaneous objects over multiple frames

  35. Two levels in the representation • Higher Level: Statistical characterization of subsequences • Instances of same activity class have certain common subsequences. • But, partially ordered will typically rearrange subsequences within the sequence. • Find a “soft” characterization of the statistics of the subsequences • Deifne similarity measure for such characterization. • Allows discovery of activity classes • Allows for detection of anomalous examples • Caveats: • We provide the events – whether it’s manually or specifying the detector doesn’t really matter (except for publication) • Training needs pre-segmentation

  36. Stochastic sequencing: n-grams

  37. Experimental Setup – Loading Dock • Barnes & Noble Loading Dock Area • One month worth of data: • 5 days a week • 9 a.m. till 5 p.m. • Event Vocabulary – 61 events • Hand-labeled for testing activity labeling, noise sensitivity. • Training detectors for these events Bird’s Eye View of Experimental Setup

  38. B&N Processing Video

  39. Activity-Class Discovery • Treating activities as individual instances • Activity-class discovery – finding maximal cliques in edge weighted graphs • Need to come up with: • Activity Similarity Metric • Procedure to group similar activities

  40. Activity Similarity • Two types of differences • structural differences • frequency differences • Sim(A,B) = 1 – normalized difference between the counts of non-zeros event n-grams • Properties • additive identity • is commutative • does not follow triangular in-equality

  41. Activity-Class Discovery • A graphic theoretic problem of finding maximal cliques in edge-weighted graphs [Pavan, Pelillo ‘03] • Sequentially find maximal cliques in edge weighted graph of activities • Activities different enough from all the regular activities are anomalies

  42. Activity-Class Discovery – Dominant Sets

  43. Anomaly Detection • Compute the within-Class similarity of the test activity w.r.t. previous class members • Learn the detection threshold from training data – can be done using an R.O.C.

  44. Anomaly "Explanation" • Explanatory features – their frequency has high mean and low variance • Explanation based on features that were: • Missingfrom an anomaly but were frequently and consistentlypresent in regular members • Extraneous in an anomaly but consistently absentfrom the regular members

  45. Results General Characteristics of Discovered Activity Classes Few of the detected Anomalies • UPS Delivery Vehicles • Fed Ex Delivery Vehicles • Delivery Trucks – multiple packages delivered • Cars and vans, only 1 or 2 packages delivered • Motorized cart used to pick and drop packages • Van deliveries – no use of motorized cart • Delivery trucks – multiple people • Back door of delivery not closed • (b) More than usual number of people • involved in unloading • (c) Very few vocabulary events performed

  46. Results • Are the detected anomalous activities ‘interesting’ from human view-point? • Anecdotal Validation: • Studied 7 users • Showed each user 8 regular activities selected randomly • Showed each user 10 test activities, 5 regular and 5 detected anomalous activities • 8 out of 10 activity-labels of the users matched the labels of our system • Probability of this match happening by chance is 4.4%

  47. Some Q's about Representations… (more discussion) • Scope and Range: • A monitored scene with pre-designed detectors • "Language" of the representation • Histograms and other statistics of feature n-gram occurrences • Computability of an instance: • Given detectors, easy to compute • Learnability of the "class": • Full power of statistical learning. Even allowing notion of outlier detector. • Stability in face of perceptual uncertainty • Fair. Needs to be better. • Inference-support • Distance-in-feature space reasoning only.

More Related