270 likes | 289 Views
Hilbert Space Embeddings of Conditional Distributions -- With Applications to Dynamical Systems. Le Song Carnegie Mellon University Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu. Hilbert Space Embeddings. Summary statistics for distributions:. (mean). (covariance).
E N D
Hilbert Space Embeddings of Conditional Distributions--With Applications to Dynamical Systems Le Song Carnegie Mellon University Joint work with Jonathan Huang, Alex Smola and Kenji Fukumizu
Hilbert Space Embeddings • Summary statistics for distributions: (mean) (covariance) (expected features)
Hilbert Space Embeddings • Reconstruct from for certain kernels! Eg. RBF kernel . • Sample average converges to true mean at .
Hilbert Space Embeddings Structured Prediction Embed Conditional Distributions p(y|x) µ[p(y|x)] Dynamical Systems
Embedding P(Y|X) • Simple case: X binary, Y vector valued • Need to embed • Solution: • Divide observations according to their • Average feature for each group What if X is continuous or structured?
Embedding P(Y|X) • Goal: Estimate embedding from finite sample • Problem: Some X=x are never observed… Embed p(Y|X=x) for x never observed?
Key Idea • If and are close, and will be similar. Generalize to unobserved X=x via a kernel on X
Conditional Embedding Conditional Embedding Operator
Conditional Embedding Operator generalization of covariance matrix
Kernel Estimate • Empirical estimate converges at
Sum and Product Rules Can do probabilistic inference with embeddings!
Dynamical Systems Hidden State Maintain belief using Hilbert space embeddings observation
Advantages • Applies to any domain with a kernel Eg. reals, rotations, permutations, strings • Applies to more general distributions Eg. multimodal, skewed
Evaluation with Linear Dynamics • Linear dynamics model (Kalman filter optimal): Trajectory of Hidden States Observations
-1 -2 x ~ h ~ N(0,10 I), N(0,10 I) Gaussian process noise, Gaussian observation noise 0.35 RKHS 0.3 RMS Error Kalman 0.25 0.2 2 4 6 Training Size ( Steps)
-2 x ~ G h ~ (1,1/3), N(0,10 I) Skewed process noise, small observation noise 0.24 RKHS 0.22 RMS Error Kalman 0.2 2 4 6 Training Size ( Steps)
-2 -2 -2 x ~ h ~ 0.5N(-0.2,10 I)+0.5N(0.2,10 I), N(0,10 I) Bimodal process noise, small observation noise 0.175 RKHS 0.17 Kalman 0.165 RMS Error 0.16 0.155 0.15 2 3 4 5 6 Training Size ( Steps)
Camera Rotation Rendered using POV-ray renderer
Predicted Camera Rotation Dynamical systems with Hilbert space embeddings works better
Video from Predicted Rotation Video Rendered with Predicted Camera Rotation Original Video
Conclusion • Embedding conditional distributions • Generalizes to unobserved X=x • Kernel estimate • Convergence of empirical estimate • Probabilistic inference • Application to dynamical systems learning and inference
The End • Thanks
Future Work • Estimate the hidden states during training? • Dynamical system = chain graphical model; how about tree and general graph topology?
-2 -1 x ~ h ~ N(0,10 I), N(0,10 I) Small process noise, large observation noise 0.5 RKHS 0.4 RMS Error Kalman 0.3 2 4 6 n Training Size (2 Steps)
Dynamics can Help 2.5 2 Trace Measure 1.5 1 -3 -2 -1 0 log10(Noise Variance)
Dynamical Systems Hidden State Maintain belief using Hilbert space embeddings observation
Conditional Embedding Operator • Desired properties: • Covariance operator: generalization of covariance matrix