500 likes | 524 Views
Learn how to perform unsupervised deep learning statistical inference using variational filtering in dynamical systems. This optimization engine utilizes neural networks to perform posterior inference and provides a structure for variational filtering in complex models. Expectation Maximization, Variational Inference, and variance reduction techniques are covered. Suitable for statisticians, ML researchers, and dynamical system experts.
E N D
NEURAL VARIATIONAL IDENTIFICATION AND FILTERING Henning Lange, Mario Bergés, Zico Kolter
Variational Filtering Dynamical Systems Variational Filtering Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)
Variational Filtering Dynamical Systems This makes it unsupervised Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)
Variational Filtering This provides the structure Dynamical Systems Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)
Variational Filtering Dynamical Systems This is the optimization engine Deep Learning Statistical Inference (Expectation Maximization, Variational Inference)
Variational Filtering For statistician: Expectation Maximization… … but with a Neural Network that tells us where to look.
Variational Filtering For ML researcher: Deep Neural Network… … that learns to perform posterior inference.
Variational Filtering For dynamical system guy: Non-linear Kalman filter… … that is unbiased* and quite fast to evaluate.
Recap • Monte Carlo Integration • Importance sampling with
Outline • 1. Statistics • Expectation Maximization • Variational Inference • 2. Deep Learning • Distributions parameterized by Neural Nets • 3. Dynamical Systems • Additional challenges from intractable joint distributions • 4. Variance Reduction
Expectation Maximization in one slide • EM is a technique to perform ML inference of parameters in a latent variable model (unsupervised learning) • Latent variable : state of appliances on/off • Coordinate Ascent on: • E-Step: • M-Step: Increase Neal, Radford M., and Geoffrey E. Hinton. "A view of the EM algorithm that justifies incremental, sparse, and other variants." Learning in graphical models.
Example: Non-Intrusive Load Monitoring = some prior, e.g. sparsity • Expectation Maximization allows for learning • could constitute reactive/active power of appliances or waveforms
Intractable posterior distributions • EM requires computation of • For many interesting latent variable models, computing is intractable
Intractable posterior distributions • For many interesting latent variable models, computing is intractable • NILM is one of them : the latent domain grows exponentially with number of appliances
Variational Inference in two slides • Expectation Maximization: • Variational Inference: Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models.
Variational Inference in two slides • Variational Inference: Evidence Lower BOund (ELBO)
Variational Inference in two slides • Variational Inference: Extract waveforms that best explain data!
Variational Inference in two slides • Variational Inference: Posterior Inference!
Connection Deep Learning • We choose to be parameterized by a Neural Networks • More detail:
Connection: Dynamical Systems • Appliances evolve over time • The temporal dynamics are important (overfitting) …
Variational Filtering • Variational Filtering:
Variational Filtering • Variational Filtering:
Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable
Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Intractable for two reasons!
Reason 1: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!
Reason 1: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!
Reason 2: Intractable Joint distribution • When modeling temporal dependencies, even the joint becomes intractable Importance sampling and MC integration!
Reason 2: Approximating the data likelihood = Importance sampling and MC integration!
Putting the pieces together This is tractable!
Are we done? Sadly no, the gradient estimator w.r.t. has high variance. However, there is remedy.
VI: Variance Unbiased but high variance!
VI: Variance • More general if is independent of : =
VI: Variance • More general if is independent of : = What’s an appropriate ?
VI: Variance Reduction • The inability to compute causes high variance • Why don’t we just use an approximation of as a control variate?
Variance reduction • Samples are drawn without replacement from Q This is not a trivial problem!
Variance reduction • Samples are drawn without replacement from Q • In order to reduce the variance of the estimator, we subtract (control variate)
Variance reduction • Samples are drawn without replacement from Q • In order to reduce the variance of the estimator, we subtract (control variate)