10 likes | 112 Views
Belief Propagation for Structured Decision Making. c 1 c 4 d 1. c 1 c 2 d 2. abc. bcd. ab. c 3 d 1. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. c 1. d 1. e. d. abe. ed. d 1. d 1. d 2. d 2. c 4 d 2 d 3. bc. c 2 c 3 d 3. c 2.
E N D
Belief Propagation for Structured Decision Making c1c4d1 c1c2d2 abc bcd ab c3d1 QiangLiu Alexander Ihler Department of Computer Science, University of California, Irvine c1 d1 e d abe ed d1 d1 d2 d2 c4d2d3 bc c2c3d3 c2 c1 d1 u d2 u c3 c2 d3 c3 Variational Framework for structured decision Abstract c4 d2 d3 Main result: • Variational inference methodssuch as loopy BP have revolutionized inference abilities on graphical models. • Influence diagrams (or decision networks)are extension of graphical models for representing structured decision making problems. • Our contribution: • A general variational framework for solving influence diagrams • A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm • Significant empirical improvement over the baseline algorithm c4 u Causes policies to be deterministic If is the maximum, the optimal strategy is • Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier) • Imperfect recall non-convex optimization (harder) • Significance: • Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops) Graphical Models and Variational Methods b • Graphical models: • Factors & exponential family form • Graphical representations: Bayes nets, Markov random fields … • Inference: answering queries about graphical models Our Algorithms a c • Junction graph belief propagation for MEU: • Construct junction graph over the augmented distribution e d Augmented distribution (factor graph) Junction graph Influence diagram Decision cluster of d1 • e.g., calculating (log) partition function: Normal cluster • Variational methods: • Log-partition function duality: • Junction graph BP: approximating and • For each decision node , identify a unique cluster (called a decision cluster) that includes • Message passing algorithm ( ) Sum-messages (from normal clusters): : locally consistent polytope Bethe-Kikuchi approximation Loopy Junction graph MEU-messages (from decision clusters): Influence Diagram Forecast Weather • Influence diagram: • Chance nodes (C): Optimal policies: Activity Happiness Conditional probability: • Decision nodes (D): Decision rule: • Strong local optimality: provably better than SPU or • Utility nodes (U): • Convergent algorithm by proximal point method: • Iteratively optimize a smoothed objective, Local utility function: Global utility function: Multiplicative Additive Maximum expected utility (MEU): Experiments Augmented distribution: Diagnostic network (UAI08 inference challenge): • Perfect recall: • Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) • Sum-max-sum rule (dynamical programming): • Perfect recall is unrealistic: memory limit, decentralized systems • Imperfect recall: • No closed form solution • Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality Decentralized Sensor network: Toy example: Imperfect recall Perfect recall 1