190 likes | 284 Views
Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem. Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute of Technology. Contents. Introduction Linear inverse problem Hierarchical variational Bayes [Sato et al.04] James-Stein estimator
E N D
Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute of Technology
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Linear inverse problem Linear inverse problem Example : Magnetoencephalography (MEG) : magnetic field detected by N detectors : observable : lead field matrix : constant matrix : parameter to be estimated : electric current at M sites : noise : observation noise Ill-posed !
Prior : Model : 1. Minimum norm maximum likelihood 2. Maximum A posterior (MAP) , where B-2 is constant. 3. Hierarchical Bayes B-2 is also a parameter to be estimated! Methods for ill-posed problem 1, 2 : similar. 3 : very different from 1, 2.
If estimate and by Bayesian methods, many small elements become zero. (relevance determination) See [9] if interested. Hierarchical Bayes a.k.a. Automatic Relevance Determination (ARD) [Mackay94,Neal96] Model : Prior : Estimate from observation, introducing hyperprior : Why ? singularities, hierarchy
Restriction: Hierarchical variational Bayes But, Bayes estimation requires huge computational costs. Apply VB [Sato et al.04]. Optimum = Bayes posterior Trial posterior: Free energy: where Variational method
: ML estimator (arithmetic mean) ML is efficient (never dominated by any unbiased estimator),but is inadmissible (dominated by biased estimator) when [Stein56]. ML JS (K=3) shrinkage factor true mean James-Stein (JS) estimator for any true Domination of a over b : for a certain true K-dimensional mean estimation (Regular model) A certain relation between EB and JSwas discussed in [Efron&Morris73] : samples James-Stein estimator [James&Stein61]
: degree of shrinkage Purpose [Sato et al.04] have derived simple iterative algorithm based on HVB in MEG application, and experimentally shown good performance. We theoretically analyze the HVB and derive its solution, and discuss a relation between HVB and positive-part JS, focusing on simplified version of Sato’s approach. Positive part JS :
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Setting Consider time series data. a’ ARDModel : Prior : time u U b Use constant hyperparameter during U [Sato et al. 04] time u
Summary of setting Observable : Parameter : Hyperparmeter (constant during U): n : # of samples Constant matrix: Model : priors: m-th element : d-dimensional normal where : identity matrix
Variational condition Restriction: Variational method
Theorem 1 Not explicit! Theorem 1: The VB estimator of m-th element is given by where HVB solution is similar to positive-part JS estimator with degree of shrinkage proportional to U.
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Proposition Simply use positive-part JS estimator : where Only requires calculation of Moore-Penrose inverse. (HVB needs iterative calculation.)
- When s are orthogonal, - When all s are parallel or orthogonal, Difference between VB and JS asymptotically equivalent. JS suppresses overfitting more than HVB. (ehhances relevant determination.) future work. - Otherwise,
Contents • Introduction • Linear inverse problem • Hierarchical variational Bayes [Sato et al.04] • James-Stein estimator • Purpose • Theoretical analysis • Setting • Solution • Discussion • Conclusions
Conclusions & future work • Conclusions • HVB provides similar result to JS estimation in linear inverse problem. • Time duration U affects learning. (large U enhances relevance determination.) • Future work • Difference from JS. • Bounds of Generalization Error. U a’ b time u time u