980 likes | 1.18k Views
Bayes, birds and brains: applications of inference and probabilistic modelling. Stephen Roberts Pattern Analysis & Machine Learning Research Group University of Oxford http://www.robots.ox.ac.uk/~parg. Introduction.
E N D
Bayes, birds and brains: applications of inference and probabilistic modelling Stephen Roberts Pattern Analysis & Machine Learning Research Group University of Oxford http://www.robots.ox.ac.uk/~parg
Introduction • Bayesian inference has profound impact in principled handling of uncertainty in practical computation • What this talks aims to do: • Give an overview of Bayesian inference applied to several real-world problem domains • What it does not aim to do: • Give endless equations – these are important and elegant, but are in the open literature
What’s wrong with sampling? • Nothing – apart from speed and the occasional frequentist way the samples are used… • Not much we can do for speed, lots of clever methods out there which help • Bayesian sampling (using Gaussian processes) • Bayes-Hermite Quadrature [O’Hagan, 1992] • Bayesian Monte Carlo [Rasmussen and Ghahramani, 2003] • Variational Bayes
Variational Bayes - 1 Log posterior bounded below by Free Energy
p(x1,x2) x2 x1 q(x1)q(x2) x2 Variational Bayes - 2 • A slow and (often) painful derivation leads to an iterative node update for DAGs • This converges to a local optimum – like EM and many other energy minimization approaches – get the priors right!
Variational Bayes – 3 • Some relief via Variational Message Passing • Same update equations as VB but at fraction of pain • Conjugate exponential family only • Pearl-style message passing on graphical model using sufficient statistics only • For many applications the factored natureof degrades performance – need non-factored proposals – extra computation (e.g. some VB models with mixture model nodes)
Priors & model selection • Sensitivity to priors • posterior distributions conjugate with priors • empirically can be a problem – know the domain • Model selection • evaluate set of models for VFE. Rank or integrate • use VFE in ‘quasi-RJMCMC’ approach • use ridge regression (ARD, weight decay) priors
Simple example - ICA • ICA (Bell & Sejnowski, Attias, Amari….) • Bayesian ICA (Roberts 1998, Attias 1999, Miskin & MacKay 2000, Choudrey & Roberts 2000)
Mixtures of ICAs (Choudrey & Roberts, 2001)
A cautionary conclusion… • In high noise regimes use ARD to focus on a small subset of models • These are then investigated in more detail using variational free energy
Priors • If we have prior knowledge regarding the sources or the mixing , we can use it. • Spatial information • Positive mixing • Positive sources • Structured observations
ICA with different priors Which is ‘correct’ though?
Structure priors • To be an ICA matrix, must lie on manifold of decorrelating matrices. These form ‘great circles’ in the matrix space. • Can parameterize using co-ordinates on the manifold. • Where do priors lie?
Gaussian priors Gaussian priors on the mixing process just form great circles – they have little impact if we already compute on the decorrelating manifold as they are aligned with the manifold.
Structure priors Potential from brain source – dipole potential (Knuth) • Sensor coupling has spatial structure, close by sensors have similar coupling weights • Gaussian process prior: still gives great circle in matrix space but very informative as not aligned along decorrelating manifold
Phantom head experiments Without prior With prior
Brain-Computer Interfaces ‘direct’ control in real-time using ‘thought’
Motor cortex • When we plan a movement, changes take place in the motor cortex, whether or not the movement takes place. • When we change cognitive task, changes take place in the cortex.
Cursor control – real time BCI Bayes – rejection Bayes baseline dT = 50ms max median min
The curse of feedback bits t (secs)
11100001010101010101010010010101010010101110101001010101001010100100101001010100101010100010100101000010010010001110000101010101010101001001010101001010111010100101010100101010010010100101010010101010001010010100001001001000 Information Engines “If all you have is a hammer, everything looks like a nail.” P(action|data) = 0.95 DATA ENGINE (MODEL) INFORMATION potential entropy machine useful
111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101111000010101010101010100100101010100101011101010010101010010101001001010010101001010101000101001010000100100100010010101 Inside or outside the box? The inferences we make, and the actions decided upon have an impact on the data Learning with changing objectives
Sequential Bayesian inference • Particle filter (SIR) • Humble (variational) Kalman filter: Bayesian inference assuming generalized (non-) linear Gaussian system • Adaptive system using sequential variational Bayes • BCI application • (Musical score following)
Generalised non-linear dynamic classifier Copes with missing inputs & labels, input noise and bit errors on labels as well as time-delayed information Penny, Sykacek, Lowne
Hidden Markov birds… • Global Positioning System (GPS) • 15g units • Strapped to back of bird • Gives position every second Roberts, Guilford, Biro, Lau 2004,5 JTB