240 likes | 373 Views
Representing Systems with Hidden State. Dorna KASHEF HAGHIGHI, Chris HUNDT * , Prakash PANANGADEN, Joelle PINEAU and Doina PRECUP School of Computer Science, McGill University ( * now at UC Berkeley) AAAI Fall Symposium Series November 9, 2007.
E N D
Representing Systems with Hidden State Dorna KASHEF HAGHIGHI, Chris HUNDT*, Prakash PANANGADEN, Joelle PINEAU and Doina PRECUP School of Computer Science,McGill University(*now at UC Berkeley) AAAI Fall Symposium Series November 9, 2007
How should we represent systems with hidden state? Partially Observable Markov Decision Processes (POMDP) • System is in some “true” latent state. • Perceive observations that depend probabilistic on the state. • Very expressive model, good for state inference and planning, but: • Very hard to learn from data. • Hidden state may be artificial (e.g. dialogue management). Predictive representations (e.g. PSRs, OOMs, TD-nets, diversity) • State is defined as sufficient statistic of the past, which allows predicting the future. • Good for learning, because state depends only on observable quantities. Our goal: Understand and unify different predictive representations. FSS 2007: Representing Systems with Hidden State
Partially Observable Markov Decision Processes • A set of states, S • A set of actions, A • A set of observations, O • A transition function: • An observation emission function: • For this discussion, we omit rewards (may be considered part of the observation vector.) FSS 2007: Representing Systems with Hidden State
A simple example • Consider the following domain: S={s1, s2, s3, s4},A={N, S, E, W} • For simplicity, assume the transitions are deterministic. • In each square, the agent observes the color of one of the adjacent walls, O={Red, Blue}, with equal probability. Question: What kinds of predictions can we make about the system? FSS 2007: Representing Systems with Hidden State
A simple example: Future predictions Consider the following predictions: • If I am in state s1 and go North, I will certainly see Blue. • If I go West then North, I will certainly see Blue. • If I go East, I will see Red with probability 0.5. • If I go East then North, I will see Red twice with probability 0.25. The action sequences are experiments that we can perform on the system. For each experiment, we can verify the predicted observations from data. FSS 2007: Representing Systems with Hidden State
Tests and Experiments • A test is a sequence of actions followed by an observation: t = a1 … an o, n ≥ 1 • An experiment is a non-empty sequence of tests: e = t1 …. tm, m ≥ 1 • Note that special cases of experiments are s-tests (Littman et al, 2002) and e-tests (Ruddary&Singh, 2004). • A prediction for an experiment e starting in s S, denoted s|e, is the conditional probability that by doing the actions of e, we will get the predicted observations. FSS 2007: Representing Systems with Hidden State
A simple example: Looking at predictions Consider our predictions again: • If I am in state s1 and go North, I will certainly see Blue. s1 | NB = 1 • If I go West then North, I will certainly see Blue. s | WNB = 1, s S Note that for any sequence of actions preceding the West action, the above prediction would still be the same. FSS 2007: Representing Systems with Hidden State
Equivalence relations • Two experiments are equivalent if their predictions are the same for every state: e1~ e2 s | e1 = s | e2, s Note: If two experiments always give the same results, they are redundant, and only one is necessary. • Two states are equivalent if they cannot be distinguished by any experiment: s1~ s2 s1| e = s2| e, e Note: Equivalent states produce the same probability distribution over future trajectories, so they are redundant. FSS 2007: Representing Systems with Hidden State
A simple example: Equivalent predictions • Consider the following experiment: NRNR • This is equivalent to : SRSR, NRSR, NNRSSSR, … • This is an infinite equivalence class, which we denote by a chosen exemplar: e.g. [NRNR] • The predictions for this class: s1 | [NRNR] = 0 s2 | [NRNR] = 0.25 FSS 2007: Representing Systems with Hidden State
Dual perspectives • Forward view: Given a certain state, what predictions can we make about the future? • In classical AI, this view enables forward planning. • It is centered around the notion of state. • Backward view: Suppose that we want a certain experiment to succeed, in what state should the system initially be? • This view enables backward planning. • It is centered around the experiments. FSS 2007: Representing Systems with Hidden State
A simple example: Dual perspectives • Forward view: Q: If we know that the system is in s1, what predictions can we make about the future? FSS 2007: Representing Systems with Hidden State
A simple example: Dual perspectives • Backward view: Q: Suppose we want the experiment NR to succeed, in what state should the system be? A: If the system starts either in state s2 or s4, the test will succeed with probability 0.5. • We can associate with the experiment NR a vector of predictions of how likely it is to succeed from every state: [0 0.5 0 0.5]T FSS 2007: Representing Systems with Hidden State
The dual machine • The backward view can be implemented in a dual machine. • States of the dual machine are equivalence classes of experiments[e]. • Observations of the dual machine are states from the original machine. • The emission fn represents the prediction probability s | [e], s S. • The transition fn is deterministic: [e] a [ae] FSS 2007: Representing Systems with Hidden State
A simple example: A fragment of the dual machine Original: Dual: (s1)= (s3)=0 (s2)= (s4)=0.5 (s1)= (s3)=1 (s2)= (s4)=0.5 [NR] [NB] N,S N,S E E W W [WR] [ER] [WB] N,S, E,W N,S, E,W N,S, E,W (s) = 1 (s) = 0 (s) = 0.5 • This fragment of the dual machine captures experiments with 1 observation. • E.g. [NR] W[WR] because s | WNR = s | WR, s. • There are separate fragments for experiments with 2 observations, 3 observations, etc. FSS 2007: Representing Systems with Hidden State
Notes on the dual machine • The dual provides, for each experiment, the set of states from which the experiment succeeds. • Note that the emission function is not normalized. • Given an initial state distribution, we can get proper probabilities Pr(s|[e]). • Experiments with different numbers of observations usually end up in disconnected components. • Arcs represent temporal-difference relations, similar to those in TD-nets (Sutton & Tanner, 2005). • This is consistent with previous observations (Ruddary & Singh, 2004) that e-tests yield TD-relationships and s-tests don’t. FSS 2007: Representing Systems with Hidden State
Can we do this again? • In the dual, we get a proper machine, with states, actions, transitions, emissions. • Can we think about experiments on the dual machine? • Repeat previous transformations on the dual machine. • Consider classes of equivalent experiments. • Reverse the role of experiments and states. • What do we obtain? FSS 2007: Representing Systems with Hidden State
The double dual machine • States of the double dual machine are bundles of predictions for all possible experiments, e.g. [s]’ and [s]’ • Equivalence classes of the type [s]’ can be viewed as homing sequences (Evan-Dar et al., 2005). • The double dual assigns the same probability to any experiment as the original machine. So they are equivalent machines. • The double dual is always a deterministic system! (But can be much larger than the original machine.) FSS 2007: Representing Systems with Hidden State
Dual: (s1)= (s3)=0 (s2)= (s4)=0.5 (s1)= (s3)=1 (s2)= (s4)=0.5 [NR] [NB] N,S N,S E E W W [WR] [ER] [WB] N,S, E,W N,S, E,W N,S, E,W (s) = 1 (s) = 0 (s) = 0.5 A simple example: The double dual machine Original: Double Dual: N,S • Equivalent states are eliminated. • Two simple homing sequences: • Action W forces system into s1. • Action E forces system into s2. E N,S S1 S2 W (NR)= 0 (NB)= 1 (ER)= 0.5 (WB)= 1 (WR)= 0 … (NR)= 0.5 (NB)= 0.5 (ER)= 0.5 (WB)= 1 (WR)= 0 … FSS 2007: Representing Systems with Hidden State
Conjecture: Different representations are useful for different tasks • Learn the double-dual • Advantage: it’s deterministic. • Problem: in general, the double-dual is an infinite representation. (In our example, it’s compact due to deterministic transitions in the original.) • Focus on predicting accurately only the result of some experiments. • Plan with the dual • For a given experiment, the dual tells us its probability of success from every state. • Given an initial state distribution: search over experiments, to find one with high prediction probability with respect to goal criteria. • Start with dual fragments with short experiments, then move to longer ones. FSS 2007: Representing Systems with Hidden State
A simple learning algorithm Consider the following non-deterministic automaton: • A set of states, S • A set of actions, A • A set of observations, O • A joint transition-emission relation: Can we learn this automaton (or an equivalent one) directly from data? FSS 2007: Representing Systems with Hidden State
Merge-split algorithm • Define: • Histories: h={a1, o1, a2, o2, …, am, om} • The empty history: • Construct a “history” automaton, H. • Algorithm: • Start with one state, corresponding to the empty history, H = { } • Consider all possible next states, h’ = hao • The merge operation checks for an equivalent existing state: h’ ~ h” h’ = h”, where h isthe set of all possible future trajectories. If found, we set the transition function accordingly: (h,ao)=h’’ • Otherwise the split operation is applied: H = H h’ (h,ao)=h’ FSS 2007: Representing Systems with Hidden State
Example The flip automaton(Holmes&Isbell’06) The learned automaton FSS 2007: Representing Systems with Hidden State
Comments • Merge-split constructs a deterministic history automaton. • There is a finite number of equivalence classes of histories. • Worse-case: size is exponential in the number of states in the original machine. • The automaton is well defined (i.e. makes the same predictions as the original model.) • This is the minimal such automaton. • Extending this to probabilistic machines is somewhat messy…. but we are working on it. FSS 2007: Representing Systems with Hidden State
Final discussion • Interesting to consider the same dynamical system from different perspectives. • There is a notion of duality between state and experiment. • Such a notion of duality is not new. E.g. observability vs controllability in systems theory. • Large body of existing work on learning automaton, which I did not comment on. [Rivest&Schapire’94; James&Singh’05; Holmes&Isbell’06; …]. • Many interesting questions remain: • Can we develop a sound approximation theory for our duality? • Can we extend this to continuous systems? • Can we extend the learning algorithm to probabilistic systems? FSS 2007: Representing Systems with Hidden State