200 likes | 215 Views
Bayesian Semi-Parametric Multiple Shrinkage. Paper by Richard F. MacLehose, David B. Dunson. Duke University Machine Learning Group Presented by Lu Ren. Outline. Motivation Model and Lasso Prior Semi-Parametric Multiple Shrinkage Priors Posterior Computation
E N D
Bayesian Semi-Parametric Multiple Shrinkage Paper by Richard F. MacLehose, David B. Dunson Duke University Machine Learning Group Presented by Lu Ren
Outline • Motivation • Model and Lasso Prior • Semi-Parametric Multiple Shrinkage Priors • Posterior Computation • Experiment Results • Conclusions
Motivation • Non-identified effects are commonplace due to high-dimensional or correlated data, such as gene microarray. • 2. Standard techniques use independent normal priors centered at zero, with the degree of shrinkage controlled by the prior variance. • 3. Coefficients could be assumed exchangeable within specific groups and be allowed to shrunk toward different means, when sufficient prior knowledge is available. • 4. As such prior knowledge is lacking, a Bayesian semiparametric hierarchical model is proposed in this paper, by placing a DP prior on the unknown mean and scale parameters.
Model and Lasso Prior Suppose we collect data , where is a vector of predictors and is a binary outcome. A stand approach is to estimate the coefficients in a regression model: . For large , maximum likelihood estimates will tend to have high variance and may not be unique. However, we could incorporate a penalty by using a lasso prior . The DE denotes a double exponential distribution, equivalent to
Multiple Shrinkage Prior In many situations shrinkage toward non-null values will be beneficial. Instead of inducing shrinkage toward zero, the lasso model is extended by introducing a mixture prior with separate prior location and scale parameters: The data play more of role in their choice of the hyper-parameters while favoring sparsity through a carefully-tailored hyperprior. DP prior is non-parametric and allows clustering of parameters to help reduce dimensionality.
Multiple Shrinkage Prior The proposed prior structure: The amount of shrinkage a coefficient exhibits toward its prior mean is determined by , with larger values resulting in greater shrinkage. Therefore, are specified to make the prior as sparse as possible.
Multiple Shrinkage Prior Assume the coefficient-specific hyperparameter values into clusters, . The number of clusters is controlled by and the coefficients are adaptively shrinked toward non-zero locations. The prior’s equivalent stick breaking form: if if The random variable and .
Multiple Shrinkage Prior • Small make the number of clusters increase more slowly than the number of coefficients. • Choosing a relatively large can give support to a wide range of possible prior means.
Multiple Shrinkage Prior • Treat falling within small range around zero as having no meaningful biologic effect . • Default prior specification: • For , . • Recommend to choose smaller values for and that are large enough to encourage shrinkage but not so large as to overwhelm the data. • Specify so the DE prior has prior credible • intervals of unit width. • 3. Setting and to assign 95% probability to a very wide range of reasonable prior effects.
Multiple Shrinkage Prior Some testing methods Assuming is null and let indicates the predictor to have some effect with probability . From MCMC, we estimate ; Or we can estimate the posterior expected false discovery rate (FDR) for a threshold , Or simply list the predictors ordered by their posterior probabilities of .
Posterior Computation Assume the outcome occurs when a latent variable, . where and • 1a. Augment the data with sampled from • 1b. Update by sampling from • 2. Update the regression coefficients using the current estimates of by sampling from the following
Posterior Computation where and . The matrix is a diagonal matrix with element and is an diagonal matrix with element • 3. Update the mixing parameter . • 4a. Update the prior location and scale parameters using a modified version of the retrospective stick breaking algorithm. • Sample with where
Posterior Computation The conditional distribution is where and • 4b. Sample from • 4c. Update the vector of coefficient configurations using a Metropolis step.
Posterior Computation where and normalizing constant for is To determine the proposal configuration for the prior, sample . If , let and draw new values of from their prior until . The new proposed configuration is for moving the coefficient to bin . The accepting probability of moving from configuration to is:
Experiment Results • Simulation 50 data set: • 400 observations and 20 parameters, with 10 of the parameters having true effect of 2 and the remaining 10 having a true effect of 0. • 100 observations and 200 parameters, 10 of which have true effect of 2 while the remaining have true effect 0. • The results show that the multiple shrinkage prior offers improvement over the standard Bayesian lasso and the reduction in MSE is largely a result of decreased bias. • a: the first 10 coefficients (MSE=0.03) compared to the standard lasso (MSE=1.08); the remaining 10 (MSE=0.01) while in standard lasso (MSE=0.04).
Experiment Results b: the MSE of the 10 coefficients with and effect of 2 is much lower in the multiple shrinkage model (1.5 vs 3.2); the remaining 190 coefficients are estimated with slightly higher MSE in the multiple shrinkage prior than the standard lasso (0.08 vs 0.01). 2. Experiments on Diabetes (Pima).
Experiment Results The multiple shrinkage prior offered improvement with a lower misclassification rate than the two standard lasso and SVM. (21%, 22% and 23%) 3. Multiple myeloma Analyze the data from 80 individuals diagnosed with multiple myeloma to determine whether any polymorphisms are related to early age. The predictor’s dimension is 135.
Conclusions • The multiple shrinkage prior provides greater flexibility in both the amount of shrinkage and the value toward which coefficients are shrunk. • The new method can greatly decrease MSE ( largely as a result of decreasing bias), which is demonstrated in the experiment results.