320 likes | 533 Views
Overview. MCMC MethodsMetropolis-Hastings and Gibbs SamplersDesign ConsiderationsCase Study: Gabor Regression Models. MCMC Methods. MCMC methods are sophisticated and general methods for simulation from a complex probability distribution, say p(x) ? x may be high dimensional, p() highly non-Gaus
E N D
1. MCMC Methods in Harmonic Models Simon Godsill
Signal Processing Laboratory
Cambridge University Engineering Department
sjg@eng.cam.ac.uk
www-sigproc.eng.cam.ac.uk/~sjg
2. Overview MCMC Methods
Metropolis-Hastings and Gibbs Samplers
Design Considerations
Case Study: Gabor Regression Models
3. MCMC Methods
MCMC methods are sophisticated and general methods for simulation from a complex probability distribution, say p(x) x may be high dimensional, p() highly non-Gaussian, multimodal:
Given a set of samples from p(x) we can compute Monte Carlo expectations for any quantities of interest by ergodic averages:
4. MCMC Contd.
In a Bayesian setting p(x) will typically be the posterior distribution:
Underlying concept is to construct an irreducible, aperiodic Markov chain having p(x) as its stationary distribution and transition kernel K(dx;x)
Initialise chain at arbitrary state x(0) (say, random) and simulate repeatedly from K(dx;x) until convergence achieved
Convergence in distribution is guaranteed under mild conditions, easily verified for most models
5. MCMC, contd. Rates of convergence are hard to compute lots of theory, but not typically applicable in practice.
However, many models, e.g. many harmonic modelling cases, can be proven to have geometric convergence rates.
7. Componentwise M-H In most cases this wont be feasible as x is high dimensional -> low acceptance rates, poor convergence
Instead, split x into components:
Then perform M-H on each component k=1,
,N:
Propose
Accept with probability
8. Gibbs Sampler Possibly the simplest form of MCMC choose
(the `full conditional distribution of xk)
Acceptance probability is 1 i.e. all moves accepted.
9. Other types of MCMC Reversible Jump MCMC extension of M-H to cases where x can have varying dimension (e.g. in sparsity estimation) see Green (1995) Biometrika
Perfect simulation special MCMC schemes that achieve exact samples from p(x) highly desirable, but slow and not yet practical for many cases
10. Design Issues and Recommendations A basic understanding of MCMC is relatively easy, but it is not so easy to construct effective and efficient samplers
Some of the main considerations are:
How to partition x into components (need not be same size, and usually arent)
What algorithms to use M-H, Gibbs, something else? In general Gibbs should only be used if the full conditionals are straightforward to sample from, e.g. Gaussian, gamma, etc., otherwise use M-H.
11. (Blocking) its nearly always best to group large numbers of components of x into single partitions xk, provided efficient M-H or Gibbs steps can be constructed for the partitions
(Rao-Blackwellisation) a related issue is marginalisation it is better (in terms of estimator variance) to integrate out parameters analytically again, subject to being able to construct efficient samplers on the remaining space:
12. References for MCMC MCMC in Practice Gilks et al Chapman and Hall (1996)
Monte Carlo Statistical Methods Robert and Casella Springer (1999)
13. MCMC Case study Gabor Regression models Now consider design of a sampler for harmonic models. Full details forthcoming as
Wolfe, Godsill and Ng (2004) - Bayesian variable selection and regularisation for time-frequency surface estimation Journal of Royal Statistical Society (Series B methodological)
(See also Wolfe and Godsill (NIPS 2002))
14. Gabor Regression Models Consider models of the form
G is a matrix of Gabor atoms here we chose an overcomplete dictionary with 2* redundancy
We will seek sparse representations with time-frequency structure encoded through prior distributions on cks
For the moment consider case of fixed, known se and sck
15. Gabor regression models Likelihood function is
Posterior probability density for c is
16. Posterior for c:
17. Gibbs Sampler blocking structures However, for large Gabor models, the matrix inversion will be very slow, and here we could look at reduced- dimension blocking structures
Then Gibbs sampler would proceed as follows, for k=1,
,K:
Its instructive to look at the form of this conditional pdf:
18. Full conditional for ck [Gk contains columns of G corresponding to partition k, and G-k the remaining columns.]
19.
This form of Gibbs sampler can be very cheap computationally
The interest in this work is to extend the modelling capabilities provided by other algorithms giving new forms of sparsity and structure. The extra steps are added in modular fashion, retaining the conditionally Gaussian structure of the coefficients and the efficient implementation
20. Sampling se First, we allow estimation of the noise floor by sampling se, assuming an inverted-gamma (IG) prior p(se2):
Under this prior (conjugate)
the full conditional takes the
same form, which is easily
sampled by standard methods
(e.g. MATLAB) :
21. Sampling coefficient parameters Next, place a structured prior distribution on the Gabor coefficients. First make them heavy-tailed to match real audio signals. This is done using Scale Mixtures of Normals (see Godsill and Rayner (IEEE Tr. Sp. And Audio 1998) for an audio restoration example).
Simply assign a prior to the variance of each ck:
Implies a non-Gaussian heavy-tailed distribution for ck
22. Priors for sck Choice of p(sck2) determines the implied heavy-tailed distribution p(ck)
In simplest case adopt the IG prior as this is conjugate. Then implied p(ck) is Students t distributed:
IG prior has Jeffreys and
exponential limiting cases, so the
family can encompass many of the
sparseness-inducing cases.
Again, the IG prior is conjugate and
Leads to a simple Gibbs sampler step:
23. Direct Sparsity Modelling Other choices of p(sck2) lead to other heavy-tailed distributions, e.g. it is possible to get a-stable or Generalised Gaussian coefficients with other choices. In these cases M-H would be used to do the sampling, see e.g. Godsill and Kuruoglu (1999 CUED Tech. Rep.).
A further addition that is easily encorporated into the MCMC is direct estimation of sparsity.
This is an important addition to the models and does not compromise the guaranteed convergence properties of the methods.
We can achieve this by allowing finite probability mass at zero in p(sck2):
24. Direct Sparsity Modelling Prior with point mass at zero:
Where gk 2{0,1} is a binary indicator variable specifying whether coefficient ck is active or inactive.
Structure is introduced at this point, through priors on the time-frequency indicator field {gk}
We use Markov chain or Markov random field priors to encourage continuity across time (tones), frequency (transients), or both:
The indicator field is also sampled using Gibbs sampling details not given here no time left
25. Final Details We also sample the parameters of
requiring one Gibbs and one M-H step.
26. Interpreting the MCMC output Assume that the MCMC has converged and initial `burn-in deleted:
Coefficient estimation:
Noise reduction:
Estimating the sparsity coefficients:
How many coefficients are active?
27. Results
28. Results, contd.
29. Typical output from the program
30. Conclusion Why use MCMC methods in harmonic models?
Extend the range of models computable
Guaranteed convergence (in the limit)
Computations can be quite cheap
Code would contain same building blocks as EM, IRLS or basis pursuit for similar models easy to modify to MCMC for baseline comparison
Its really not as complicated or slow as people think!!
Why not use MCMC methods?
Can be computationally expensive
Convergence diagnostics unreliable
You may not want to explore new models
31. References C.P. Robert and G. Casella, Monte Carlo StatisticalMethods, New York: Springer Verlag, 1999
W. R. Gilks and S. Richardson and D. J. Spiegelhalter, Markov chain Monte Carlo in practice, London: Chapman and Hall, 1996
P. J. Green, Reversible Jump Markov-chain Monte Carlo computation and Bayesian model determination, Biometrika, 82(4), pp. 711-732, 1995
32. Harmonic models and MCMC SJG references see www-sigproc.eng.cam.ac.uk/~sjg P. J. Wolfe, S. J. Godsill, and W.J. Ng. Bayesian variable selection and regularisation for time-frequency surface estimation Journal of the Royal Statistical Society, Series B, 2004. Read paper (with discussion). To Appear.
M.Davy and S. J. Godsill. Bayesian harmonic models for musical signal analysis (with discussion). In J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, editors, Bayesian Statistics VII. Oxford University Press, 2003.
P. J. Wolfe and S. J. Godsill. Bayesian modelling of time-frequency coefficients for audio signal enhancement. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA. MIT Press, 2002.
S. J. Godsill and P. J. W. Rayner. Digital Audio Restoration: A Statistical Model-Based Approach. Berlin: Springer, ISBN 3 540 76222 1, September 1998.
S. J. Godsill and P. J. W. Rayner. Robust reconstruction and analysis of autoregressive signals in impulsive noise using the Gibbs sampler. IEEE Trans. on Speech and Audio Processing, 6(4):352-372, July 1998.
S. J. Godsill and E. E. Kuruoglu. Bayesian inference for time series with heavy-tailed symmetric alpha -stable noise processes. In Proc. Applications of heavy tailed distributions in economics, engineering and statistics, June 1999. Washington DC, USA. CUED Tech. Rep.