Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State

Efficiently LearningLinear-Linear Exponential FamilyPredictive Representations of State David Wingate* wingated@mit.edu University of Michigan Satinder Singh baveja@umich.edu University of Michigan *Now a postdoc at MIT

Outline • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm (> NIPS) • Experiments and conclusions

The Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht 8 F | ht The goal: model the conditional distribution of the infinite future given any history

aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht Fn | ht Central PSR assumption: the parameters describing the conditional distribution of the short-term future are state

Examples of PSRs Central assumption: the parameters describing the conditional distribution of the short-term future are state • Discrete Observations • State: Expectations of core set of indicator random variables or probabilities of specific coretests • HMMs / POMDPs • Continuous Observations • State: Parameters of Gaussian distribution modeling future n observations • Linear Dynamical System (Kalman Filter Systems)

Distribution of Short-Term Future Second (EFPSR) assumption: the distribution over the short-term future has an exponential family form: These parameters will be our state!

Maintaining State ... ... aot aot aot+1 AOt+1 AOt+2 AOt+2 AOt+3 AOt+3 AOt+4 AOt+4 aot AOt+1 AOt+2 AOt+3 AOt+4 ... ... ... ... st+ = extend( st, q) st+1 = condition( st+, ot+1)

Learning an EFPSR Given a trajectory of T data there are three things to learn: Model parametersq Dimension n Features of the future f ( Fn ) st+ = extend( st, q)

The Linear-Linear Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

Linear-Linear EFPSR Linear conditioning: Linear extension function: st+ = extension( st, q) = q st st+1 = G( ot+1 ) st+ Overall state update is linear: st+1 = G( ot+1 )q st Two useful properties: 1) Maximum likelihood gradients are easy to derive 2) Linearity will make efficient approximations possible

Exact Likelihood for ML Learning Exact likelihood of the data: Importantly, the model is fully observed No latent states involved in the expression for likelihood

Results of Exact ML on POMDPs Maximum likelihood learning via gradient ascent Metric: data likelihood under true model compared to likelihood under learned EFPSR Model quality is the amount of the gap between naïve and true models which the EFPSR closes Unfortunately, exact learning is intractable in large domains

A Tractable Learning Algorithm • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

Why is Exact ML Intractable? Cannot perform exact inference for gradients Naïve parameterization yields too many parameters st+ = extension( st, q ) = q st Approximate inference Low-rank approximation Cannot do exact inference T times per gradient step Approximate likelihood

Approximate Likelihood for ML Learning Exact likelihood of the data: Double application of Jensen’s inequality Zero covariance assumption Approximate lower bound on likelihood of the data: Could be used for other models / algorithms

Interpretation of Approximate Likelihood The unconditionalexpected parameters For the EFPSR, this is the stationary distribution of states! For LL-EFPSR, can be computed as solution to linear system of equations based on stationary distribution of observations

Interpretation of Approximate Likelihood The unconditionalexpected features This is the stationary distribution of features Computed once from data

Interpretation of Approximate Likelihood The log-partition function using the stationary distribution of states

Interpretation of Approximate Likelihood Gradient: Expected features of the future induced by the model Expected features of the future observed in the data

Interpretation of Approximate Likelihood Find model parametersqsuch that the model’s stationary distribution of features match the empirical stationary distribution of features Tractable because model is fully observed and state update is linear

Experiments • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions

Evaluating with RL Approximate inference / likelihood creates a problem: how do you evaluate the learned model? Solution: use reinforcement learning Cheesemaze Maze 4x3

Example: Bouncing Ball Problem Noise-free observations 110 possible observations 2nd order Markov Noisy observations 2110 possible observations No longer 2nd order Markov

Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F2 | ht ) Ot+1 Ot+2 Fn | ht

Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F3 | ht ) Ot+1 Ot+2 Ot+3 Fn+1 | ht

Bouncing Ball Results Generalizes across observations Learning is efficient

The Robot Domain ... ... ... Ot+1 Ot+2 Ot+3 Observations are camera images Extract about 1,000 binary features n=3 200,000 samples NMF inference Low-rank approximation Rank-aware line search f( Fn | ht ) constructs about 12,000 features

Robot Domain Results Outperforms best 1st-order Markov and random policies A significant accomplishment for PSRs

Conclusions Learning a LL-EFPSR model is straightforward Expression for ML is defined in terms of observable quantities Gradient is analytically tractable Compatible with approximate likelihood Interpretation based on stationary distributions Encouraging experimental results Almost perfect models of small systems Able to start tackling domains larger than any other PSR model Future work Tractability; approximations; convexity; feature extraction

Thank you!

Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State