260 likes | 283 Views
Explore the concept of Interactive Dynamic Influence Diagrams and their potential with model clustering. Learn about computational savings, error bound, and empirical results in multiagent decision-making scenarios.
E N D
Twenty Second Conference on Artificial Intelligence (AAAI’07) Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering Yifeng Zeng Aalborg University Denmark Prashant Doshi Univ. of Georgia USA Qiongyu Chen National University of Singapore
Outline • Interactive Dynamic Influence Diagrams (I-DIDs) • Curses of History and Dimensionality • Model Clustering • Computational Savings and Error Bound • Experimental Results
Interactive Dynamic Influence Diagrams (I-DIDs) (Doshi et al. AAMAS’07) • Graphical models for decision-making in multiagent settings • Sequential decision-making over multiple time steps in multiagent settings • Generalize dynamic IDs to multiagent domains • Differ from MAIDs (Koller&Milch01) and NIDs (Gal&Pfeffer04) • Online solutions to I-POMDPs (Gmytrasiewicz&Doshi, JAIR’05) • Allow nested modeling of agents
Aj Mj,l-1 Level l I-ID Overview of I-ID Ri Ai • A generic level l Interactive-ID (I-ID) for agent i situated with one other agent j • Model Node: Mj,l-1 • Models of agent j at level l-1 • Policy link: dashed line • Distribution over the other agent’s actions given its models • Beliefs on Mj,l-1 • P(Mj,l-1|s) • Update? S Oi
Details of the Model Node • Members of the model node • Different chance nodes are solutions of models mj,l-1 • Mod[Mj] represents the different models of agent j • CPT of the chance node Aj is a multiplexer • Assumes the distribution of each of the action nodes (Aj1, Aj2) depending on the value of Mod[Mj] Mj,l-1 Aj S Mod[Mj] mj,l-11 Aj1 mj,l-11, mj,l-12 could be I-IDs or IDs mj,l-12 Aj2
Ri Ait+1 St+1 Ajt+1 Oit+1 Mj,l-1t+1 Interactive Dynamic Influence Diagrams (I-DIDs) Ri Ait Ajt St Oit Mj,l-1t Model Update Link
Semantics of Model Update Link Ajt+1 Mj,l-1t+1 Ajt st+1 Mj,l-1t Mod[Mjt+1] st mj,l-1t+1,1 Aj1 Mod[Mjt] mj,l-1t+1,2 Oj Aj2 mj,l-1t+1,3 mj,l-1t,1 Aj3 Aj1 Oj1 mj,l-1t+1,4 mj,l-1t,2 Aj4 Aj2 Oj2 These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations
Curse of history of agent j Curses of History and Dimensionality • Primary complexity of solving I-DIDs is due to the large number of models that must be solved over time Curse of dimensionality • At time step t: • Nested property of modeling • More Agents • N+1 agent setting: (NM)l models (M is bounded # of models at each level)
Model Clustering • Idea: Prune the model space to K representative models from M candidate models, K << M, at each time step • Approach • Cluster Models • k-means clustering method (MacQueen67) • Note: k is not equal to K • Clusters contain models that are likely behaviorally equivalent • Select Krepresentative models from the clusters
Selection of Initial Means • Facilitate clustering of behaviorally equivalent models • Behaviorally equivalent regions • Prescribe the same optimal behavior for j • [0,0.1], [0.1,0.9], [0.9,1] • Select region boundary points as initial means • 0, 0.1, 0.9, 1 10 -1 Value L OL OR 1 0 0.1 0.9 P(TR) Sensitivity points
Selection of Initial Means • Sensitivity points • Models that induce policies that are different from those by surrounding models • Vertices of the belief simplex • One dimension: 0, 1 • Two dimensions: [0,0], [0,1],[1,0], and [1,1]
LP for Computing Sensitivity Points SPs are non-dominated points on intersections between value functions SP Non-dominated Intersection
Example of Iterative Clustering P(TR) 0.1 0.9 0 1 Initial Means Iteration 1 . . . . . . Iteration n Select K=10
Cluster models Re-compute means K Model Selection Algorithm Clustering Select Initial Means Selection Compute SPs Select K nearest models
Approximate Solution of I-DID • Exact algorithm • Expansion phase • Expand all M models over time • Look-ahead phase • Approximation – Modify exact algorithm • Prune model space using KModelSelection • Maintain only K models over time • Look-ahead phase
Computational Savings and Error Bound • (NM)lV.S.(NK)l • Mgrows exponentially over time • Retain K models (Mk) and discard M-K models (M/K) • Error bounded by finding the model among the K retained models that is the closest to the discarded one (PBVI; Pineau et al. 03)
Error Bound Let Error bound for agent j Expected error bound for agent i
Empirical Results • Two Problem Domains • Multiagent tiger • Multiagent machine maintenance • Comparison with • Exact solution of I-DID for different M • Interactive particle filtering on I-DID • Measure • Average rewards solving the level 1 I-DIDs • Variance over 50 runs • Run time
Run Time Comparison • Slower than the I-PF • Reason: convergence step • Solve I-DIDs up to 8 horizons
Future Work • Variants of model clustering • Application domains • Compose our package for I-DIDs
Notes • Updated set of models at time step (t+1) will have at most models • :number of models at time step t • :largest space of actions • :largest space of observations • New distribution over the updated models uses • original distribution over the models • probability of the other agent performing the action, and • receiving the observation that led to the updated model
K Model Selection • Initial Means • Sensitivity points + Vertices of the belief simplex • Iteration • Re-compute the cluster mean • Assign new models to clusters • Selection • Select K models • Kn: In proportion to the size of cluster n