Goals, plans, and planning

Goals, plans, and planning Northwestern UniversityCS 395 Behavior-Based Robotics Ian Horswill

Modal logic • Need to reason about • States of knowledge • Goals • These aren’t propositions about objects … • … but rather about other propositions (define-signal front-sonar … (mode (know (< front-sonar 2000))))…(define-signal fspace(min front-sonar front-left-sonar front-right-sonar)) (define-signal advance (behavior (know fspace) (rt-vector 0 fspace)))

Modalities in GRL • In GRL, a modality is a special kind of signal procedure • The signal it returns is just a default • You can override it with a mode declaration • It’s memoized so that it always returns the same signal object when called on the same signal object (define-signal-modality (mymode x) … compute default …)(define-signal sigexpr (mode (mymode expr)))

Simplified modality definitions (define-signal-modality (know x) (define inputs (signal-inputs x)) (signal-expression (apply and (know inputs)))) (define-signal-modality (goal x) (define the-mode (signal-expression (accumulate or))) (define (forward-goal y) (drive-signal! x y)) (for-each forward-goal (signal-inputs x)) the-mode)

GRL modal logic API • (know x)Whether x’s value is known • (goal x)True if x is a goal of achievementRobot “wants” to make it true and move on • (maintain-goal x)True if x is a maintenance goalRobot “wants” to make it true and keep it true • (know-goal x)True if x is a knowledge goalRobot “wants” to determine the value of x

Built-in inference axioms (know (operatorarg …)) (and (know arg) …) (goal (know x))  (know-goal x) (goal (maintain x))  (maintain-goal x) (know (know x)) true (know (goal x)) true

Goal reduction API • (define-signal s (and a b c …))(define-reduction s parallel) • When s is a goal, all its inputs are goals • This is what was shown three slides ago • (define-signal s (and a b c …))(define-reduction s serial) • When s is a goal, a is a goal • When s is a goal and a is true, b is a goal • When s is a goal and both a and b are goals, c is a goal

Useful functions • (know-that x)True if (know x) and x • (satisfied-goal x)True if x is a goal and is true • (unsatisfied-goal x)True if x is a goal and is false • (parallel-and a b c …)And gate with parallel goal reduction • (serial-and a b c …)And gate with parallel goal reduction

Planning • Given • Goal (desired state of the environment) • Current state of the environment • Set of actions • Descriptions of how actions change the state of the environment • Actions are essentially functions from states to states • Find a series of actions (called a plan)that will result in the desired goal state

A bad planning algorithm • Key idea: simulate every possible series of actions until your simulation finds the goal Plan(s, g) {for each action a { let s’ = a(s) the state after running aif s == g return s else try { return a+plan(s’,g) } catch backtrack {}; // Try another action}throw backtrack; }

Complexity • Have to search a tree of plans • If there are n possible actions, there are nm possible m-step plans • Naïve algorithm is exponential • Cleaver optimizations possible, but it’s still basically an exponential problem

Generalizations • Conditional planning • Allow ifs inside of the plan to handle contingencies • More robust • More expensive to plan • Automatic programming • Plans can be arbitrary programs • Fully undecidable

Generalizations (2) • Markov Decision Problems (MDPs) • Actions aren’t deterministic • Only know a probability distribution on the possible result states for each action • Actions are now functions from probability distributions to probability distributions • Plan can’t be a program anymore (how do you know what the output state is?) • Payoff function that tells you how good a state is • Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff • Really really expensive

Generalizations (3) • Partially Observable MDPs (POMDPs) • Actions aren’t deterministic • Don’t know what state you’re in • Sensors only give us a probability distribution on states • Not states • Policy has to map probability distributions (called “belief states”) to actions • Not states to actions • Payoff function that tells you how good a state is • Find the policy that gives you the best expected (i.e. average over the state probability distribution) payoff • Really really really expensive

Generalizations (4) • Can you detect a pattern here? • How to get tenure • Find a complicated instance of a problem that current technology can’t handle • Devise an elegant yet prohibitively expensive technology to solve it • Write a paper that starts with “To survive in complex dynamic worlds, an agent must …” • Add a description of your technique • Prove a lot of theorems about how your technique will solve all instances of the problem given more CPU time than the lifetime of the universe • Write: “Future work: make it fast”

Goals, plans, and planning