160 likes | 251 Views
Planning to Gather Information. Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt. Underwater Vent Finding. AUV used to find vents
E N D
Planning to Gather Information Richard Dearden University of Birmingham Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan Sridharan (Texas Tech), Jeremy Wyatt
Underwater Vent Finding • AUV used to find vents • Can detect vent itself (reliably), plume of fresh water emitted • Problem is where to go to collect data to find the vents as efficiently as possible • Hard because plume detection is unreliable, can’t easily assign ‘blame’ for the detections we do make
Vision Algorithm Planning • Goal: Answer queries and execute commands. • Is there a red triangle in the scene? • Move the mug to the right of the blue circle. • Our operators: colour, shape, SIFT identification, viewpoint change, zoom etc. • Problem: Build a plan to achieve the goal with high confidence
Assumptions • The visual operators are unreliable • Reliability can be represented by a confusion matrix, computed from data • Speed of response and answering the query correctly are what really matters • We want to build the fastest plan that is ‘reliable enough’ • We should include planning time in our performance estimate too
$ POMDPs • Partially Observable Markov Decision Problems • Markov Decision Problem: • (discrete) States, stochastic actions, reward • Maximise expected (discounted) long-term reward • Assumption: state is completely observable • POMDPs: MDPs with observations • Infer state from (sequence of) observations • Typically maintain belief state, plan over that
POMDP Formulation States: Cartesian product of individual state vectors Actions: A = {Colour, Shape, SIFT, terminal actions} Observations: {red, green, blue, circle, triangle, square, empty, unknown} Transition function Observation function given by confusion matrices Reward specification time cost of actions, large +ve/-ve rewards on terminal actions Maintain belief over states, likelihood of action outcomes
POMDP Formulation • For a broad query: ‘what is that?’ • For each ROI: • 26 states (5 colours x 5 shapes + term) • 12 actions (2 operations, 10 terminal actions SayBlueSquare, SayRedTriangle, SayUnknown, …) • 8 observations • For n ROIs: • 25n + 1 states • Impractical for even a very small number of ROIs • BUT: There’s lots of structure. How to exploit it?
A Hierarchical POMDP • Proposed solution: Hierarchical Planning in POMDPs – HiPPo • One LL-POMDP for planning the actions in each ROI • Higher-level POMDP to choose which LL-POMDP to use at each step • Significantly reduces complexity of the state-action-observation space Which Region to Process? HL POMDP • Model creation and policy generation are automatic, based on the input query How to Process? LL POMDP
Low-level POMDP • The LL-POMDP is the same as the flat POMDP • Only ever operates on a single ROI • 26 states, 12 Actions • Reward combines time-based cost for actions and answer quality • Terminal actions are answering the query for this region
Example • Query: ‘where is the blue circle?’ • State space: {RedCircle, RedTriangle, BlueCircle, BlueTriangle, …, Terminal} • Actions: {Colour, Shape, …, SayFound, …} • Observations: {Red, Blue, NoColour, UnknownColour, Triangle, Circle, NoShape, UnknownShape, …} • Observation probabilities given by confusion matrix
Policy • Policy tree for uniform prior initial state • We limit all LL policies to a fixed maximum number of steps Colour B R Shape sNotFound C T Shape Shape T C C T . . . sFound sNotFound sFound
High-level POMDP • State space consists of the regions the object of interest is in • Actions are regions to process • Observations are whether the object of interest was found in a particular region • We derive the observation function and action costs for the HL-POMDP from the policy tree for the LL-POMDP • Treat the LL-POMDP as a black box that returns definite labels (not belief densities)
Example • Query: ‘where is the blue circle?’ • State space: • Actions: {DoR1, DoR2, SayR1, SayR2, SayR1^R2, SayNo} • Observations: {FoundR1, ¬FoundR1, FoundR2, ¬FoundR2} • Observation probabilities are computed from the LL-POMDP
Vent Finding Approach • Assume mapping using occupancy grid • Rewards only for visiting cells with vents in • State space also too large to solve POMDP • Instead do fixed length lookahead in belief space • Reasoning in belief space allows us to account for value of information gained from observations • Use P(vent|all observations so far) as heuristic value at end of lookahead
What we’re working on now • Most of these POMDPs are too big to solve • Take a domain, problem description in a very general language, generate a classical planning problem for it • Assume we can observe any variable we care about • For each such observation, use a POMDP planner to determine the value of the variable with high confidence