Structured Models for Decision Making

Structured Models forDecision Making Daphne Koller Stanford University koller@cs.Stanford.edu MURI Program on Decision Making under Uncertainty July 18, 2000

Roadmap Bayes Nets PRMs Static Encapsulation Reuse Dynamic PRMs DBNs Dynamic Encapsulation Approximation Relational MDPs Factored MDPs Decision Problem Factored Policy Iteration, Efficient PRM inference

Outline • Probabilistic Relational Models • Representing complex domains • Structural uncertainty • Temporal models • Decision making

Basic units of knowledge entities properties relations attributes

BNs are not suitable for representing complex, structured, flexible domains. So what? • Set of entities and relations between them is determined at BN design time • structure must be known in advance • hard to adapt to changes • BNs for complex domains are large & unstructured •  very hard to build • No ability to generalize • across “similar” individuals • across related situations

Probabilistic Relational Models • Combine advantages of predicate logic & BNs: • natural domain modeling: objects, properties, relations; • generalization over a variety of situations; • compact, natural probability models. • Integrate uncertainty with relational model: • properties of domain entities can depend on properties of related entities; • uncertainty over relational structure of domain.

Example object classes: Battalion Battery Vehicle Location Weather. Example relations: At-Location Has-Weather Sub-battery/In-battalion Sub-vehicle/In-battery Real-World Case Study Battlefield situation assessment for missile units • several locations • many units • each has detailed model

At-Location Status Report Scud Battery: Simplified PRM Under Fire Launcher #(Launcher.status = ok) Next Mission

SCUD Battery Model

Cargo Vehicle Group

Original BN*: SCUD Battery Disadvantages • A lot more complex • must include relevant attributes of related objects • Hard to transfer information between different BN models *Built by IET, Inc.

Angel Island Alcatraz 3rd Scud Battalion 17th Scud Battalion Scud Battery 1 Scud Battery 2 Scud Battery 3 Launcher 1 Situation Models • Complex situations can be described compactly by specifying objects and relations between them • Class model is instantiated for each object, with probabilistic dependencies induced by relations

Example reasoning pattern Scud-Battalion-Charlie under_fire under_fire heavy 0.06 0.44 0.28 0.33 Battery1 hit hit Group-TLs Loc TL1 TL2 damaged good damaged hide-support hide-support rep_damaged rep_damaged #reported_damaged #reported_damaged none

Attack Angel Island Alcatraz Under Fire 3rd Scud Btn 17th Scud Btn B1.Launch B2.Launch Scud Bty 1 Scud Bty 2 Scud Bty 3 B1.L1.Damaged B2.L1.Damaged B1.L2.Damaged B2.L2.Damaged Launcher 1 B2.L1.Report B1.L1.Report B2.L2.Report B1.L2.Report B1.Success B2.Success Inference in PRMs + PRM Situation description Induces BN over attributes

Exploit Structure for Inference • Encapsulation: objects interact in limited ways • Inference can be encapsulated within objects, with “communication” limited to interfaces • Reuse:objects from same class have same model • Inference from one can be reused for others

Effects of exploiting structure 6000 flat BN no reuse with reuse 5000 4000 running time in seconds 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 #vehicles of each type / battery

Extension: Structural Uncertainty • Uncertainty about model structure: • Set of objects: is that radar signal from a tank • Relations between objects: location of SCUD-Battalion-C • Task 1: Seamless integration w. probabilistic model • structural variables can depend on other variables. • Task 2: Efficient Inference • Use approximate inference to simplify model • variational methods to summarize multiple potential influences • MCMC for traversing possible relationships • Use structured inference (encapsulation/reuse) on simplified model

Outline • Probabilistic Relational Models • Temporal models • Structured belief-state tracking • Dynamic PRMs: time, events and actions • Decision making

Dynamic Bayesian Nets Action(t+2) Action(t) Action(t+1) • Compact representation of system dynamics • discrete, continuous, hybrid • Generalization of Kalman filters ... Velocity(t+2) Velocity(t) Velocity(t+1) Position(t+2) Position(t) Position(t+1) Observed_pos(t) Observed_pos(t+1) Observed_pos(t+2)

Observed_pos(t) Observed_pos(t+1) Tracking System State Task: Maintain Belief state— distribution over current state given evidence so far Action(t+2) Action(t) Action(t+1) ... Velocity(t+2) Velocity(t) Velocity(t+1) Position(t+2) Position(t) Position(t+1) • In discrete/hybrid systems, belief state representation is exponential in # of state variables • In hybrid systems, # of distinct hypotheses grows exponentially over time

H i X D i i True False 0.7 0.3 Approximate Tracking • Decompose belief state along “subsystem lines” • Maintain belief state as product of marginals • In hybrid systems, keep mixture of hypotheses for every subsystem • Merge hypotheses associated with similar density

Case Study: Diagnosis & Tracking for Five-Tank System F1o F23 F5o • State space per time slice • eleven-dimensional continuous space • 227 discrete failure modes observables

2 burst C12 1.5 Neg drift burst C45 1 Neg drift C23 0.5 burst 0 Measurement errors: F23, F5o 0 5 10 15 20 25 30 35 40 45 50 The doomsday scenario

Algorithm Performance 2 1.8 C12 1.6 1.4 P5 1.2 C45 1 0.8 0.6 0.4 0 5 10 15 20 25 30 35 40 45 50 2 1.8 C12 1.6 1.4 P5 1.2 C45 1 0.8 0.6 0.4 0 5 10 15 20 25 30 35 40 45 50 Omniscient Kalman Filter

Dynamic PRMs • Goal: Model complex structured systems • that evolve over time • where agents take compound structured actions & construct effective scalable inference algorithm • Easy part: Add time relation to PRMs • Allows notion of current and previous state • Maintains notions of structured objects and relations • Challenges: • Appropriate representation for actions, events • Modeling changes in domain structure (objects, relations) • Effective inference that exploits structure

Dynamic PRMs: Event Models Events: Discrete points where the system undergoes a discontinuous change • Events can be triggered by external events • an agent’s action or by system dynamics • e.g., a unit reaches its destination • Events can influence the system structure • discrete change in continuous dynamics • truck velocity goes to 0 when destination is reached • modification of relational structure • aircraft taking off is no longer on aircraft carrier • creation / deletion of objects • units entering/leaving battlespace

Dynamic PRMs: Adding Actions • Use relational / hierarchical action representation • class hierarchy for Move action • an instantiation of a particular action is related to object moving, road taken, origin, destination • Actions can depend on and influence attributes of related objects • duration of Move action may depend on road condition, influence status of moving objects • Actions are like events, can change domain structure • Complex actions can be composed of simpler ones: • Effects of complex action derived from that of subactions

Inference in Dynamic Systems • Main tasks: • situation monitoring • prediction • Goal: Exploit structure as we did in PRMs • First step: Encapsulation • Exploit structure of weakly interacting subsystems • Applied successfully to Dynamic Bayesian Nets

Tracking in Dynamic PRMs • Use relational structure to guide belief state approximation • direct dependencies only between related objects • Deal with dynamic structure: • relations and even domain objects change over time • want to adjust our approximation to context • structural uncertainty critical • Event-driven tracking • no reason to use fine-grained model of “boring bits” • but “fast forward” requires ability to propagate dynamics over variable-length segments

Outline • Probabilistic Relational Models • Temporal models • Decision making • Planning in factored MDPs • Planning in relational MDPs

What is a Markov Decision Process? • An MDP is a controlled dynamic process • Stochastic transition between states • Actions affect system dynamics • Rewards or costs are associated with states • Objective: Drive process to regions of high reward • MDP solutions are policies • Policies assign an action to every state

MDP Policies & Value Functions Suppose an expert told you the “value” of each state: V(s1) = 10 V(s2) = 5 s1 s1 0.7 0.5 s2 s2 0.3 0.5 Action 2 Action 1

Greedy Policy Construction Pick action with highest expected future value: Expectation over next-state values

Bootstrapping: Policy Iteration Idea: Greedy selection is useful even with suboptimal V Guess V Repeat until policy doesn’t change  = greedy(V) V = value of acting on  Guaranteed to find globally optimal policy if V is defined over explicit states, i.e., if V is exponential Exploit Structure with Factored Policy Iteration

R2 R1 Factored MDPs: DBNS + Rewards t t+1 Rewards have small sets of parent variables too X Y Total reward adds sub-rewards: R=R1+R2 Z

Linearly Decomposable Value Functions Note: Overlapping is allowed! Approximate high-dimensional value function with combination of lower-dimensional functions Motivation: Multi-attribute utility theory (Keeney & Raifa)

Decomposable Value Functions • Each basis functionhi is the status of some small part(s) of a complex system • status of a machine • inventory of a store • status of a subgoal Linear combination of restricted domain functions

Exploiting Structure X Key operation: backprojection of a basis function thru a DBN transition Y Z Structure allows us to consider operations over small subsets of variables, not the entire state space.

+11 +1 +7 +4 +12 +8 Policy Format Factored value functions  compact action effect descriptions Action 1 Action 2 Sorted result values form a decision list: If then action 1 else if then action 2 else if then action 1

Factored Policy Iteration: Summary Structure induces decision-list policy Guess V  = greedy(V) V = value of acting on  Key operations isomorphic to BN inference • Time per iteration reduced from O((2n)3) to O(Cbk3) • Cb = cost of Bayes net inference (function of structure) • k = number of basis functions (k << 2n)

Run Times 70000 States Seconds 3n^3 60000 50000 40000 CPU Seconds/States 30000 20000 10000 0 4 6 8 10 12 14 16 State Variables Note: Nearly optimal policy found in all cases ( 6).

Planning in Relational MDPs • Replace DBN transition model with dynamic PRM • Generalize factored policy iteration • Define basis functions via relational formulas: • Replace BN inference with PRM inference as key step • Exploit hierarchical structure of complex actions by encapsulating decision making along hierarchy • Potential benefits: • Tractable approximate planning in relational domains • Unification of classical and stochastic planning

Conclusions: Past & Present • PRMs compactly represent complex systems with multiple interacting objects: • coherent (probabilistic) semantics; • structured representation: modularity & reuse. • Scalable inference that exploits structure • Tracking algorithms for DBNs that exploit system decomposition • Planning algorithms in MDPs that exploit structure of system and of value functions Theme: Representation & inference scale up, if we exploit structure

Conclusions: Future • Better inference for densely connected PRMs • Extending PRMs with time, events, actions • Exploit structure for inference in dynamic PRMs: • system decomposition into subsystems • relational context • varying time granularity • Planning in dynamic PRMs: • extend factored policy iteration to PRMs • exploit hierarchical action decomposition

Students & postdocs Nir Friedman ( Hebrew U.) Dirk Ormoneit Ron Parr ( Duke) Xavier Boyen Urszula Chajewska Lise Getoor Carlos Guestrin Uri Lerner Uri Nodelman Avi Pfeffer ( Harvard) Eran Segal Benjamin Taskar Simon Tong Brian Milch ( Berkeley) Ken Takusagawa ( MIT) Support: PECASE Award via ONR YIP DARPA’s HPKB Program MURI Program “Integrated Approach to Intelligent Systems” Sloan Faculty Fellowship DARPA’s IA Program under subcontract to SRI International DARPA’s DMIF Program under subcontract to IET Inc. ONR grant Acknowledgements Postdocs PhD students Ugrad http://robotics.stanford.edu/~koller/

Structured Models for Decision Making