310 likes | 315 Views
This paper presents a tractable learning algorithm for efficient learning of linear-linear exponential family predictive state representations. The algorithm is evaluated through experiments and conclusions.
E N D
Efficiently LearningLinear-Linear Exponential FamilyPredictive Representations of State David Wingate* wingated@mit.edu University of Michigan Satinder Singh baveja@umich.edu University of Michigan *Now a postdoc at MIT
Outline • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm (> NIPS) • Experiments and conclusions
The Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions
aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht 8 F | ht The goal: model the conditional distribution of the infinite future given any history
aot–2 aot–1 aot AOt+1 AOt+2 AOt+3 AOt+4 Modeling Dynamical Systems ... ... ht Fn | ht Central PSR assumption: the parameters describing the conditional distribution of the short-term future are state
Examples of PSRs Central assumption: the parameters describing the conditional distribution of the short-term future are state • Discrete Observations • State: Expectations of core set of indicator random variables or probabilities of specific coretests • HMMs / POMDPs • Continuous Observations • State: Parameters of Gaussian distribution modeling future n observations • Linear Dynamical System (Kalman Filter Systems)
Distribution of Short-Term Future Second (EFPSR) assumption: the distribution over the short-term future has an exponential family form: These parameters will be our state!
Maintaining State ... ... aot aot aot+1 AOt+1 AOt+2 AOt+2 AOt+3 AOt+3 AOt+4 AOt+4 aot AOt+1 AOt+2 AOt+3 AOt+4 ... ... ... ... st+ = extend( st, q) st+1 = condition( st+, ot+1)
Learning an EFPSR Given a trajectory of T data there are three things to learn: Model parametersq Dimension n Features of the future f ( Fn ) st+ = extend( st, q)
The Linear-Linear Exponential Family PSR • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions
Linear-Linear EFPSR Linear conditioning: Linear extension function: st+ = extension( st, q) = q st st+1 = G( ot+1 ) st+ Overall state update is linear: st+1 = G( ot+1 )q st Two useful properties: 1) Maximum likelihood gradients are easy to derive 2) Linearity will make efficient approximations possible
Exact Likelihood for ML Learning Exact likelihood of the data: Importantly, the model is fully observed No latent states involved in the expression for likelihood
Results of Exact ML on POMDPs Maximum likelihood learning via gradient ascent Metric: data likelihood under true model compared to likelihood under learned EFPSR Model quality is the amount of the gap between naïve and true models which the EFPSR closes Unfortunately, exact learning is intractable in large domains
A Tractable Learning Algorithm • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions
Why is Exact ML Intractable? Cannot perform exact inference for gradients Naïve parameterization yields too many parameters st+ = extension( st, q ) = q st Approximate inference Low-rank approximation Cannot do exact inference T times per gradient step Approximate likelihood
Approximate Likelihood for ML Learning Exact likelihood of the data: Double application of Jensen’s inequality Zero covariance assumption Approximate lower bound on likelihood of the data: Could be used for other models / algorithms
Interpretation of Approximate Likelihood The unconditionalexpected parameters For the EFPSR, this is the stationary distribution of states! For LL-EFPSR, can be computed as solution to linear system of equations based on stationary distribution of observations
Interpretation of Approximate Likelihood The unconditionalexpected features This is the stationary distribution of features Computed once from data
Interpretation of Approximate Likelihood The log-partition function using the stationary distribution of states
Interpretation of Approximate Likelihood Gradient: Expected features of the future induced by the model Expected features of the future observed in the data
Interpretation of Approximate Likelihood Find model parametersqsuch that the model’s stationary distribution of features match the empirical stationary distribution of features Tractable because model is fully observed and state update is linear
Experiments • The Exponential Family PSR • The Linear-Linear Exponential Family PSR • A tractable learning algorithm • Experiments and conclusions
Evaluating with RL Approximate inference / likelihood creates a problem: how do you evaluate the learned model? Solution: use reinforcement learning Cheesemaze Maze 4x3
Example: Bouncing Ball Problem Noise-free observations 110 possible observations 2nd order Markov Noisy observations 2110 possible observations No longer 2nd order Markov
Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F2 | ht ) Ot+1 Ot+2 Fn | ht
Example: Bouncing Ball Problem Each observation is an array of binary random variables Features of the future f( F3 | ht ) Ot+1 Ot+2 Ot+3 Fn+1 | ht
Bouncing Ball Results Generalizes across observations Learning is efficient
The Robot Domain ... ... ... Ot+1 Ot+2 Ot+3 Observations are camera images Extract about 1,000 binary features n=3 200,000 samples NMF inference Low-rank approximation Rank-aware line search f( Fn | ht ) constructs about 12,000 features
Robot Domain Results Outperforms best 1st-order Markov and random policies A significant accomplishment for PSRs
Conclusions Learning a LL-EFPSR model is straightforward Expression for ML is defined in terms of observable quantities Gradient is analytically tractable Compatible with approximate likelihood Interpretation based on stationary distributions Encouraging experimental results Almost perfect models of small systems Able to start tackling domains larger than any other PSR model Future work Tractability; approximations; convexity; feature extraction