1 / 14

Stochastic Grammars: Overview

Explore the use of stochastic grammars in recognizing and decomposing activities in Towers of Hanoi tasks. Learn how internal scene models aid in context-sensitive recognition, contributing to efficient feedback mechanisms between reasoning components. Analyze video data using Expectation Grammars, segment objects, detect noise, and more.

lhuff
Download Presentation

Stochastic Grammars: Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Grammars: Overview • Representation: Stochastic grammar • Terminals: object interactions • Context-sensitive due to internal scene models • Domain: Towers of Hanoi • Requires activities withstrong temporal constraints • Contributions • Showed recognition &decomposition with veryweak appearance models • Demonstrated usefulnessof feedback from high tolow-level reasoning components • Extended SCFG: parameters and abstract scene models

  2. Expectation Grammars(CVPR 2003) • Analyze video of a person physically solving the Towers of Hanoi task • Recognize valid activity • Identify each move • Segment objects • Detect distracters / noise

  3. System Overview

  4. Low-Level Vision • Foreground/background segmentation • Automatic shadow removal • Classification based onchromaticity andbrightness differences • Background Model • Per pixel RGB means • Fixed mapping from CDand BD to foregroundprobability

  5. ToH: Low-Level Vision Raw Video Background Model Foreground and shadow detection Foreground Components

  6. Merge Enter Low-Level Features • Explanation-based symbols • Blob interaction events • merge, split, enter, exit, tracked, noise • Future Work: hidden, revealed, blob-part, coalesce • All possible explanations generated • Inconsistent explanations heuristically pruned

  7. Expectation Grammars ToH -> Setup, enter(hand), Solve, exit(hand); Setup -> TowerPlaced, exit(hand); TowerPlaced -> enter(hand, red, green, blue), Put_1(red, green, blue); Solve -> state(InitialTower), MakeMoves, state(FinalTower); MakeMoves -> Move(block) [0.1] | Move(block), MakeMoves [0.9]; Move -> Move_1-2 | Move_1-3 | Move_2-1 | Move_2-3 | Move_3-1 | Move_3-2; Move_1-2 -> Grab_1, Put_2; Move_1-3 -> Grab_1, Put_3; Move_2-1 -> Grab_2, Put_1; Move_2-3 -> Grab_2, Put_3; Move_3-1 -> Grab_3, Put_1; Move_3-2 -> Grab_3, Put_2; Grab_1 -> touch_1, remove_1(hand,~) | touch_1(~), remove_last_1(~); Grab_2 -> touch_2, remove_2(hand,~) | touch_2(~), remove_last_2(~); Grab_3 -> touch_3, remove_3(hand,~) | touch_3(~), remove_last_3(~); Put_1 -> release_1(~) | touch_1, release_1; Put_2 -> release_2(~) | touch_2, release_2; Put_3 -> release_3(~) | touch_3, release_3; • Representation: • Stochastic grammar • Parser augmented with parameters and internal scene model

  8. Forming the Symbol Stream • Domain independent blob interactions converted to terminals of grammar via heuristic domain knowledge • Examples: merge + (x ≈ 0.33) → touch_1split + (x ≈ 0.50) → remove_2 • Grammar rule can only fire if internal scene model is consistentwith terminal • Examples: can’tremove_2 if nodiscs on peg 2 (B) • Can’t move disc tobe on top of smallerdisc (C)

  9. ToH: Example Frames Explicit noise detection Objects recognized by behavior, not appearance

  10. ToH: Example Frames Detection of distracter objects Grammar can fill in for occluded observations

  11. Finding the Most Likely Parse • Terminals and rules are probabilistic • Each parse has a total probability • Computed by Earley-Stolcke algorithm • Probabilistic penalty for insertion and deletion errors • Highest probability parse chosen as best interpretation of video

  12. Contributions • Showed activity recognition and decomposition without appearance models • Demonstrated usefulness of feedback from high-level, long-term interpretations to low-level, short-term decisions • Extended SCFG representational power with parameters and abstract scene models

  13. Lessons • Efficient error recover important for realistic domains • All sources of information should be included (i.e., appearance models) • Concurrency and partial-ordering are common, thus should be easily representable • Temporal constraints are not the only kind of action relationship (e.g., causal, statistical)

  14. Representational Issues • Extend temporal relations • Concurrency • Partial-ordering • Quantitative relationships • Causal (not just temporal) relationships • Parameterized activities

More Related