HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot

HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot Mohan Sridharan Joint work with Jeremy Wyatt and Richard Dearden University of Birmingham, UK CS5331: Autonomous Mobile Robots

CoSy – Description • Focus: • Systems that can perceive, understand and interact with the environment. • Sense/Manipulate objects on a tabletop. • Back to blocks-world?  • Dynamic response, reliability. • Components: • Images with objects segmented to form ROIs, speech commands. • Bind information across different modalities like speech, vision, touch. • Actuator: 5 DOF “Katana” arm. CS5331: Autonomous Mobile Robots

Communication SA Communication SA Communication SA Communication SA Comm. SA Visual SA Planning SA Manip. SA Binding SA Coordinator SA Communication SA Communication SA Linguistically-driven Manipulation • Goals raised by language. • Refer to objects by learned features. • Plan intentional actions using planner. • Intention shifting handled by monitoring and replanning. CS5331: Autonomous Mobile Robots

Questions/Problems in CAS • Binding: How do we match information from one component with information from another? • Filtering: How does an architecture pick where a piece of information should go? • Processing Management:How does the robot decide which bits of information should be processed, and what processing should be performed? • Action Fusion: How should low-level actions be coordinated at a fine-grained level? CS5331: Autonomous Mobile Robots

Sample Video – CoSy CS5331: Autonomous Mobile Robots

Visual Processing Management • Robot and human manipulate and converse about objects. • Is there a red triangle?, move the mug to the right of the blue circle. • Features: • State not observable, actions modify belief. • Non-deterministic actions: color, shape etc. • Computational complexity. • Constraints: • Dynamic response, reliability! • Approach: plan visual processing –where to look? what to look for? CS5331: Autonomous Mobile Robots

Related Work • Planning sequences of visual operations: • Image interpretation (POMDP: Darrell 97, MDP: Li et al. IIS03), Image processing (Borg: Clouard et al. PAMI99, Astronomy: Chiens et al. ProcSoft00) • Classical Planning schemes: • Layered architecture (Brooks, RA86), ACT-R(Laird et al. AI87), SOAR(Anderson et al. PR04), FF (Hoffmann and Nebel, JAIR01), • Observation Planners: • C-BURIDAN (Draper et al. UAI94), PKS (Petrick and Bacchus, ICAPS04), CP (Brenner and Nebel, PCAR06). • Hierarchical planning: • MAXQ (Dietterich, ICML98), Nursebot (Pineau et al. RAS03), RN-POMDP (Foka et al. IJCAI 05). • Imposing/learning structure in POMDPs: • FSC (Hansen et al. ICAPS03), DBN (Toussaint et al. UAI08). CS5331: Autonomous Mobile Robots

POMDP for one ROI – • States: Cartesian product of individual state vectors. • Actions: visual+”special”. • Observations: red, green, blue, circle, triangle, square, empty, unknown. CS5331: Autonomous Mobile Robots

POMDP for one ROI – • Transition function. • Observation function. • Reward specification. • Excellent mathematical machinery to model desired features – probabilistic representation for uncertainty in action outcomes and states. • Drawback:Exponential state explosion with several ROIs and actions – 25n + 1 states for n ROIs with just two visual actions!! CS5331: Autonomous Mobile Robots

Hierarchical POMDP Formulation • Proposed solution: Hierarchical Planning in POMDPs – HiPPo  • One POMDP for planning the processing actions on each ROI. • Higher-level POMDP to choose one of the LL-POMDPs at each step. • Significantly reduces complexity of the state-action-observation space. • Model creation and policy generation are completely autonomous, based on the input query. Which Region to Process? HL-POMDP LL-POMDP How to Process? CS5331: Autonomous Mobile Robots

HiPPo – LL Formulation • Operates on a single ROI. • Key points: • Observation functions learned. • Transition function is an identity matrix, except for special actions and actions that change the state. • Reward function trade-off: time-based cost for actions and answer quality. • LL-policy is terminated after N levels. CS5331: Autonomous Mobile Robots

HiPPo – HL Formulation • HL-POMDP: • State space: object presence in different combinations of regions. • Action ui means process Ri • FRi means desired object found in Ri • Key points: • Observation functions and costs derived from the policy trees of LL-POMDPs. • LL-POMDPs are black boxes that return definite labels (not belief densities). CS5331: Autonomous Mobile Robots

Illustrative Example • Consider the scene with two ROIs extracted. • Query: Where is the blue circle? • Available operators: Color, Shape, SIFT. CS5331: Autonomous Mobile Robots

Example – Where is the Blue Circle? CS5331: Autonomous Mobile Robots

Estimating OH and RH • Condition LL observation probabilities by high level states. • Determine expected cost of running the tree and likelihood of finding target object, conditioned on the high level state. . . . CS5331: Autonomous Mobile Robots

Experimental Setup • The HL-POMDP and LL-POMDPs are query-specific. • LL-POMDPs for each ROI written in ZMDP format. • Solved using point-based VI [Smith & Simmons, 05] • Generate observation probabilities, costs for HL-POMDP, which is solved in a similar manner. • Performed ~60 queries, multiple trials of each. • Occurrence: “Is there a red cup in the scene?” • Location: “Where is the blue circle?” • Property: “What colour is the box?” • Global Scene: How many green squares are there?” CS5331: Autonomous Mobile Robots

Joint POMDP vs. HiPPo CS5331: Autonomous Mobile Robots

A ‘Modern’ Classical Planner • Continual Planning (CP) [Brenner & Nebel, 06] provides a solution to this problem. • CP allows actions with non-deterministic effects: • Use these to represent information gathering actions. • Assumes that actions are reliable. • At plan time, the planner asserts that the effect it wants will actually occur • If the effect doesn’t occur at execution time, replan. • Intuitively: build a contingent plan, but replanning ensures you only build the branches you need. CS5331: Autonomous Mobile Robots

Comparison of Planning Time CS5331: Autonomous Mobile Robots

Comparison of Planning + Execution Time CS5331: Autonomous Mobile Robots

Reliability Analysis • Modern planners that do not model uncertainty cannot do much better than naïve visual processing. • HiPPo exploits models of action outcomes to provide higher reliability. CS5331: Autonomous Mobile Robots

Summary and Future Work • Visual processing management posed as a planning problem. • HiPPo models uncertainty well, provides efficient and reliable performance. • Slightly more time than CP but significantly more reliable. • Lots of other operators to integrate: Viewpoint Change, Zoom, … • Object interaction: • Push, poke object? • Learn object (epistemic) affordances. CS5331: Autonomous Mobile Robots

Summary and Future Work • From image analysis to scene analysis: • Should I look somewhere else or analyse the ROIs I have now? • Use information maximization principles. • Incorporate on a mobile robot, and a team of mobile robots. • Model human feedback to learn from and interact with humans. • Collaborate with humans in real-world tasks. • Joint project with UT-Austin and University of Arizona (3-5 years). CS5331: Autonomous Mobile Robots

That’s all folks  CS5331: Autonomous Mobile Robots

We really are done  CS5331: Autonomous Mobile Robots

HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot